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Preface 



During the last years the production costs for IT (information technology) systems 
have steadily decreased with regard to their complexity. IT applications that had to be 
realised as expensive PCBs formerly, can now be realised as a system-on-chip. 
Furthermore low cost broad band communication media for wide area communication 
as well as for the realisation of local distributed systems are available. Typically the 
market requires IT systems that realise a set of specific features for the end user in a 
given environment, so called embedded systems. Often end user are not aware that 
several IT components are embedded in their application systems. Some examples for 
such embedded systems are control systems in cars, air planes, houses or plants, 
information and communication devices like digital TV, mobile phones or 
autonomous systems like service- or edutainment robots. 

For the design of embedded systems the designer has to tackle three major aspects: 

• the application itself including the man-machine interface, 

• the (target) architecture of the system including all functional and non 
functional constraints and 

• the design methodology including modelling, specification, synthesis, test 
and validation. 

For a systematic development of embedded system the second and third points are 
of major interest. From the first point results the requirement to support a broad 
variety of methodologies for the different application areas of embedded systems. A 
design methodology itself has only a limited benefit in an industrial development 
environment. A methodology has to be supported by a design environment, before it 
becomes common practice. Since the high level specification of features by 
appropriate languages becomes more and more important in the design of embedded 
systems, methodologies often centre on the introduction or use of modelling and 
specification languages. Today most embedded application can only be realised by 
distributed systems. The aspect of designing a communication system in a distributed 
real-time environment is a subtask in the synthesis design step, which has to solve its 
own specific design problems. 

This book documents the high quality approaches and results, which were 
presented at the International Workshop on Distributed and Parallel Embedded 
Systems (DIPES 2000), organised by IFIP working groups WG 10.3, WG 10.4 and 
WG 10.5. The workshop took place on October 18 -19^*^, 2000, in SchloB Eringerfeld 
near Paderbom, Germany. During the workshop the new IFIP TC 10 SIG-ES (Special 
Interest Group on Embedded Systems) was founded. The SIG-ES is a forum for 
discussion of all aspects of embedded systems. Some points of special interest are the 
design of embedded real time software and real time operating systems as well as the 
design of distributed embedded systems. In future the DIPES will be an event 
organised by this new SIG-ES. 




Vlll 



This book is organised similar to the workshop. Chapters 1 and 4 (Methodology I 
and II) deal with different modelling and specification paradigms and the 
corresponding design methodologies. In Chapter 3 several design environments for 
the support of specific design methodologies are presented. Generic system 
architectures for different classes of embedded systems are presented in Chapter 2. 
Problems concerning test and validation are discussed in Chapter 5. The last two 
chapters include implementation and synthesis techniques for embedded systems 
(Chapter 7) and for distribution and communication (Chapter 6). 
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A METHODOLOGY FOR COMPLEX 
EMBEDDED SYSTEMS DESIGN 

Petri Nets within a UML Approach 

R.J. Machado, J.M. Fernandes, H.D. Santos 

DSI/DI, Escola de Engenharia - Universidade do Minho, Portugal 



This paper focuses mainly on the analysis phase, describing a 
UML-based approach for designing complex embedded systems, 
and specifically the usefulness of using shobi-PN v2.0 
specifications, a Petri net extension, for modelling the dynamic 
behaviour. A relatively complex case study is used to show the 
usefulness of the suggested specification approach. 



1. Introduction 

The vast majority of embedded systems are control-dominated systems and 
traditionally designers specify them using only a state-oriented model, such as FSMs. 
However, real-time embedded systems are getting quite complex, which implies that a 
different approach is necessary. The system specification has to fulfil several 
requirements, namely support for concurrency, timing constraints, hierarchy, data and 
control flow, and distributed computations. 

Thus, for modelling more aspects of the systems (namely, data and function), it is 
important to consider genuine multiple-view models. There is also absolutely no 
doubt that IT organisations can improve efficiency and productivity if they share the 
same notation. In this context, the authors recommend the utilisation of some UML 
views to specify embedded systems, because it is a notation that covers the most 
relevant modelling aspects of systems and it is an OMG standard. 




Figure 1 . The HIDRO lines. 
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The diagrams shown in this paper are all relative to the Blaupunkt-Bosch’s car 
radios production lines (HIDRO lines) controller (fig. 1), which is one of the several 
complex embedded systems that the authors have used, as real case studies, to validate 
the proposed methodology. 



2. UML 

UML is a general purpose modelling language for specifying, visualising, 
constructing and documenting the artefacts of software systems, as well as for 
business modelling and other non-software systems (Booch et al, 1999). As a 
standard language for defining and designing software systems, UML is being 
progressively accepted as a language in industrial environments. UML is meant to be 
used universally for the modelling of systems, including automatic control 
applications with both hardware and software components, so the authors believe that 
it is an appropriate choice for embedded systems. To confirm the usefulness of UML 
for this engineering field, several research teams (Douglass, 1998; Lyons, 1998; 
Lanusse et al., 1998; McLaughin et al., 1998; Kabous et al, 1999; Jigorea et al., 
2000) have also adopted UML as the notation for specifying embedded systems, 
which means that this notation is gaining widespread acceptance and usage within this 
community. 

The main views used by the authors for specifying the system are captured by the 
following diagrams: (1) use case diagrams are used to capture the functional aspects 
of the system as viewed by its users; (2) object diagrams show the static configuration 
of the system, and the relations among the objects that constitute the system; 
(3) sequence diagrams present scenarios of typical interactions among the objects that 
constitute the system or that interact with it; (4) class diagrams store the information 
of ready-made components that can be used to build systems and specify the 
hierarchical relationships among them; {5) Petri nets (shobi-PN v2.0) are used to 
specify the dynamic behaviour of some objects/classes. 

Although the OMG's Real-time Analysis and Design working group has not come 
yet with a final proposal for directly incorporating real-time concepts into the UML 
standard (namely in what concerns the syntax for the OCL language), the authors are 
using UML for dealing with hard real-time systems. Up to now timed sequence 
diagrams and Oblog syntax have been used for the specification of the canonical 
latency and duration constraints, which are viewed as composites for more accurate 
categories of timed requirements (for performance and safety constraints 
specification). 



2. 1. Use Case Diagrams 

A use cases diagram is considered to be a powerful and useful technique for capturing 
the user's requirements. It is an easy-to-read diagram that divides the system in its 
functional points. A use case can be understood as a functionality or service that the 
system offers to its users. 
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The authors propose an extension to UML by adding a new property to use cases, 
that was designated reference. New properties are added in UML by tagged values, so 
each use case can have a reference that follows a numbering scheme similar to the 
traditional DFD numbering. Each use case at the system-level is assigned a reference 
(example: ref=2), and if this use case is refined by other sub-use cases, each of these 
will have a reference that uses the super-use case as a prefix (example: ref=2.3). This 
numbering scheme can be repeated to any depth and it helps those involved in the 
project to relate all use cases diagrams and will be also used during the transition from 
use cases to objects to ease the mapping between both models. 

Fig. 2 shows the system-level (or top-level) use cases diagram of the HIDRO lines, 
where it is possible to visualise which actors perform which functionalities. Since use 
cases have different impact on the final system, they must be ranked taken into 
consideration their importance to the main functionality of the system. This allows the 
project to follow a risk-driven process, where the most important or complex 
functionalities of the system are first tackled, leaving the less important ones to be 
treated later. 




Figure 2. Use case diagram. 

2.2. Object Diagrams 

Object diagrams are also an important technique to show the components that 
constitute the system. Transforming the use cases that divide the system in a 
functional way into objects is a critical task, since usually there is no direct mapping 
from use cases to objects. Thus, a strategy, composed of some guidelines, is needed to 
guide the developers of the system on how to transform use cases into objects. There 
are some approaches for transforming the use cases diagrams into object diagrams, 
but the majority of them is based on personal feelings and some kind of magic. The 
authors have defined a systematic strategy, based on the object types (interface, entity 
and control) presented in (Jacobson et ai, 1992), for finding the objects of a given 
system based on its use cases. This strategy is called 4-step rule set and has been 
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published in (Fernandes et al, 2000). Fig. 3 depicts the object diagram obtained by 
the application of the 4-step rule set strategy to the use case diagram of fig. 2. 




Figure 3. Object diagram of the HIDRO lines system. 



2 . 3 . Sequence Diagrams 

The proposed methodology uses sequence diagrams as an intermediate format to 
specify the system’s dynamic behaviour. Sequence diagrams correlate several objects 
in a particular scenery, that corresponds to a part of each object’s life cycle. Its is also 
possible to inscribe timing constrictions in the sequence diagram. Fig. 4 shows one 
sequence diagram of the HIDRO lines system. 
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Figure 4: Sequence diagram. 
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The authors’ methodology allows the use of UML non-standard sequence 
diagrams, that are called scenery diagrams. These diagrams are very useful when the 
pictorial representations of the data path/plant’s control sub-sequences are relevant for 
a thoroughly understanding of the controller’s behaviour. In fig. 5 it is possible to 
observe a scenery diagram of the HIDRO lines. 




Figure 5: Scenery diagram. 



The methodology also proposes the use of non-standard data path/plant diagrams 
for data path/plant’s resources static specification, since UML does not define any 
diagram for that. These diagrams are needed to understand the pictorial representation 
of the data path/plant’s resources involved in the scenery diagrams. 



3. Petri Nets 

For the system’ components that have a complex or interesting dynamic behaviour, a 
state model can be specified. UML has two different meta-models for this purpose: 
StateCharts and activity diagrams. Although these two meta-models present many 
important characteristics for reactive systems, namely concurrency and hierarchy, 
they do not allow an elegant treatment of the data path/plant resources management 
and the specification of dynamic parallelism. These are two crucial necessities for 
complex, distributed and parallel embedded systems, since different parts of the 
system may try to access simultaneously the same resource. SpecCharts are also 
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another interesting state-oriented meta-model for specifying and designing embedded 
systems (Gajski et ai, 1994). 

For control embedded systems, the application of Petri nets (PNs) to the 
specification of the behavioural view can benefit from a huge amount of available 
research. The designer can choose, among several PN meta-models, a specific one 
intentionally developed to deal with the semantical specificities of that kind of 
systems, like the ones referred in (Kleinjohann et al., 1997; Sgroi et aL, 1998). 

PN is a mathematical meta-model that can be formally analysed and for which 
several implementation techniques are available. In this context, the authors have 
developed an extended PN meta-model, designated shobi-PN, to specify the reactive 
behaviour of the system’ components instead of using the ''StateCharts + activity 
diagrams” UML proposal. 

3.1. Theshobi-PNvI.O 

The traditional synchronous and interpreted PN meta-model (SIPN) was developed, 
aiming just the specification of the control part of the system: the data path/plant of 
the system can not be described with the mechanisms available on the meta-model. To 
overcome this limitation, the shobi-PN v 1.0 meta-model, which is an extension to the 
SIPN model, was developed (Machado et al, 2000). The shobi-PN v 1.0 meta-model 
supports hierarchy and allows objects to be used for specifying the data path/plant 
resources. 

The shobi-PN V 1.0 meta-model presents the same characteristics as the SIPN 
meta-model, in what concerns synchronism and interpretation, but adds new 
mechanisms by supporting object-oriented modelling ideas and new hierarchical 
constructs, in both the control unit and the data path/plant. This meta-model embodies 
concepts present in Synchronous PNs (David et al, 1992), Hierarchical PNs (Fehling, 
1993), Coloured PNs (Jensen, 1992), and Object-Oriented PNs (Lakos, 1995). In the 
shobi-PN V 1.0 meta-model, the tokens represent objects that model data path/plant 
resources. The instance variables represent the information that is processed on the 
data path/plant and the methods are the interface between the control unit and the data 
path/plant. Each token models a structure of the data path/plant. A node (a transition 
or a place) invokes the tokens’ methods, when the tokens arrive at that node. Each arc 
is associated with one or more colours which indicate the types of objects that are 
allowed to pass through that arc. This means that, for each data path/plant structure, 
there is a well-defined path on the PN. 

Hierarchy can be introduced in the specifications in two different ways: (l)the 
control unit is modelled by the PN structure, and to introduce the hierarchy on the 
controller, macronodes (representing sub-PNs) may be used; (2) the data path/plant 
resources are represented by the internal structure of the tokens, and the hierarchy can 
be introduced by aggregation (composition) of several objects inside one single token 
(a macrotoken) or by using the inheritance of methods and data structures. 

Whenever several methods that use the same data structures are concurrently 
invoked to a given token in different nodes, it is necessary to support a replica 
mechanism. This mechanism allows a token to be replicated as many times as needed, 
so that it is structurally possible to concurrently invoke methods to the same token, 
but in distinct areas of the PN. This mechanism can be used as an elegant solution for 
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a complex problem (the multiple-sourcing) that could be alternatively, but 
inefficiently, solved at the algorithmic level, by changing the PN structure. This 
mechanism becomes indispensable when the modelling of the data path by 
hierarchical aggregation is not possible. The replica are the only solution to ensure the 
parallelism inherent to the data path/plant structure, if the mechanism does not destroy 
the tokens’ data structures consistency. 

This shobi-PN vl.O meta-model has been exhaustive used in several application 
domains of medium complexity: industrial controllers (Machado et ai, 1997b), 
communication interfaces (Machado et ai, 1998a), and micro-architecture of 
processors (Machado et al, 1998b). 

3.2. The shobhPN v2.0 

The use of shobi-PN vl.O meta-model to specify the behaviour of the level 2 
controller of the HIDRO production lines has revealed some semantical fragilities of 
that modelling approach, namely when it is mandatory to assure: (1) the violation of 
levels of structural hierarchy by the introduction of tokens/objects in arbitrary zones 
of the PNs (this is very useful when, for some specific objects, it is crucial to bypass 
some levels of the controller’s hierarchy); (2) the creation and destruction of objects 
for momentary reference of objects that are external to the system; (3) the 
manipulation of the original (genuine) objects and not the eventual replica that the 
dynamic execution of the PNs can create (this is vital to deal with critical regions in 
the control of multiple accesses to shared resources - for instance, the elevators in the 
HIDRO lines case study). 

To solve this three kinds of detected problems, the authors have extended the 
shobi-PN vl.O meta-model (which has originated the shobi-PN v2.0 meta-model) by 
defining: (1) a generalised arc set {Gas) which allows the use of 16 different types of 
arcs, each one with specific syntactic and semantic properties within the 
shobi-PN v2.0 meta-model; (2) the concept of asynchronous macro-transition {AMT) 
as an auxiliary mechanism to the GAS, to solve the specific problem of the violations 
of the structural hierarchy’s levels. Fig. 6 shows one shobi-PN v2.0 specification net 
of the HIDRO production lines level 2 controller. 

The tokens/objects that must appear in the shobi-PN nets are found by calculating 
the system high-level object diagram, obtained by applying to the global object 
diagram (fig. 3) one filtering and collapsing technique, also developed by the authors 
(see fig. 7). 



4. Tools 

From a pragmatic point of view, a methodology for developing systems can only be 
useful for its users if there exist tools supporting the development tasks. The proposed 
methodology is an on-going project, but there are already some tools available for the 
developers. 

A graphical environment was developed to allow animation/simulation which 
generates UML sequence diagrams. These diagrams are built from the system 
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specification and allow the designers to compare them with similar diagrams 
constructed previously in co-operation with the system’s customers. 




Figure 6: A shobi-PN v2.0 specification net. 

This eases the methodology to follow the operational approach (Zave, 1984), 
which permits the customers to validate their requirements directly from the system 
specification (i.e. without having to fully develop a system prototype or even the 
system itself). If some errors are detected, the system specification can be modified 
prior to the implementation phase, which greatly reduces the development costs and 
increases the system’s correctness. Other tools are under development, namely 
graphical editors (for specifying the systems) and compilers (for automatically 
generating C code for the MCS-51 compatible microprocessors). A preliminary 
version of all this tool-set is expected to be available very soon. This tool-set is the 
descendant of the one authors h^ve presented in (Machado et al, 1997a), i.e. it 
supports directly the shobi-PN meta-model, using the Oblog language 
(www.oblog.com) as the implementation support. 
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Figure 7: High-level object diagram of the HIDRO lines. 



5. Conclusions 

This paper presents the general characteristics of a UML-based methodology to 
support the design of complex embedded systems. The authors defend the use of 
PN-based behavioural specifications, instead of using the UML StateCharts and 
activity diagrams. In this context, the shobi~PN v2.0 meta-model was generally 
explained, in what concerns the usefulness of its GAS and AMT concepts. This paper 
also shows the simulation environment the authors have developed to directly support 
the design of complex embedded controllers. All the UML standard and non-standard 
diagrams shown in this paper are relative to the Blaupunkt-Bosch’s auto radios 
production lines. As a typical example of a complex embedded system, the controller 
for this production line has been used by the authors to validate their methodology. 
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Engineers are more and more often faced to the hard problem 
of developing more sophisticated real-time systems in a world 
where time to market constraints are constantly increasing. 
Object oriented modelling with UML brings significant 
answers to these issues. However, the real-time behavior 
specification of an application is not yet completely satisfying. 
Available methods provide a good support for parallelism 
modelling of an application but are often poor to express real- 
time features like deadlines, periods and priorities. In this 
paper, we present a specific UML use supporting qualitative 
(e.g. multitasking, data sharing, etc) and quantitative (e.g. 
deadline, period, etc) aspects of real-time behavioral 
specification. For that purpose, we introduce the concept of the 
UML active object and present a structured way to use UML 
Statecharts in order to describe the behavior of an active object 
without losing any object properties. 

Keywords : Real-time, UML, Embedded systems, Active Object. 



1. Introduction 

During the last years UML has become the lingua franca among system modelers all 
over the world. Although its presence in the software domain has been successful, 
UML still lacks significant semantics that will allow its dominance in specific 
domains, like the one of real-time systems; the description of the real-time behavior 
of such systems is not completely satisfying yet. Available methods (like UML/SDL 
[1], UML/RT[2l RT/UML [3] or OCTOPUS [4]) provide good support for modeling 




12 



Architecture and Design of Distributed Embedded Systems 



the parallelism in an application, but are often poor to express quantitative real-time 
features such as deadlines, periods, priorities etc. 

The methodology introduced in this paper is based on the ACCORD/UML 
approach ([5] [6] [7]) and its main contribution is on the behavioral specification of 
objects and on the integration of real-time properties to the UML models of the 
system under development. Realtimeness within UML models is added from the 
early specification stages and is transferred consistently down to the final 
implementation stages where C/C++ code is produced. 

The rest of the paper is organized in two parts. In the first one we present a 
structural way to use UML Statecharts; in the second one the effectiveness of the 
proposed methodology is depicted through the design a real world application 
borrowed from the domain of real-time industrial networks. 



2. A specific use of UML Statechart for real-time system 
design 

In UML, a Statechart owns a context that may be either a classifier (like, classes or 
use cases) or a behavioral feature (like operations or methods). Within the 
ACCORD/UML approach, Statecharts are only used at two levels of granularity to 
design the behavior of an application: 

• Class behavior is described through a restrictive and specialized use of 
UML Statechart largely based on protocol Statechart as defined in UML 
semantics; 

• Operations behavior is described via an alternate view of Statecharts. We 
introduce this view not to define our proper action language but simply 
because of the lack of action language in UML (at this time). 

2. 1. Class behavior model : protocol and triggering Statecharts 

Within the ACCORD/UML approach, the Statechart attached to a class aims at 
modeling its behavior and can be reckoned under two points of view (abstraction): 
protocol view and triggering view. In UML, objects of an application communicate 
through message passing that is the result either of a signal raising or of an operation 
invocation. 

2.1.1 Protocol view of a class’ behavior 

The Statechart describing the protocol view of a class intends to describe all the 
possible behaviors of a class’ instance when it receives a message in the form of a 
called operation. This specific view allows the designer to specify which operation 
calls are possible for each state of the object. Indeed, the protocol view describes 
“what a class can do”. The Statechart describing the protocol view of an object refers 
only to protocol-transitions (transitions between the states of the Statechart 
describing the protocol view) as illustrated in Figure 2. The behavior specified within 
a protocol Statechart is available regardless of the object’s type: standard or real-time 
active object. 
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A transition has usually two distinct parts: the left-side part which specifies the 
triggering condition of the transition; and the right-side part which describes the 
actions to execute when the transition is fired. A protocol-transition owns only the 
left-side part specification, i.e. a trigger event (which has to be typed CallEvent) with 
possibly a guard. The right-side part of the transition, i.e. the effect specification of a 
transition, is empty. Actually, the action sequence specification is implicit to a 
protocol-transition specification. When such a transition is fired everything happens 
as if an internal event was sent triggering the execution of the method implementing 
the operation associated to the call event which has triggered the transition. This rule 
implies that each operation defined in the interface of an object has to be attached to 
one method implementing its behavior. 




Figure 1 : Protocol view of the global behavior of the SpeedRegulator class. 

The syntax of a protocol-transition label is then: event-name comma- 

separated-parameter-list )' ‘['guard-condition 7 'with the following WFR^ applied 
in the context of a protocol-transition self.trigger.oclIsTypeOffCallEvent). 

2.1.2. Triggering view of a class’ behavior 

Apart from CallEvent (which was used in protocol view), UML defines additional 
event types (SignalEvent, ChangeEvent, TimeEvent, CompletionEvent) that can also 
trigger Statechart transitions. In order to manage such events and to specify the 
additional behavioral requirements specific to active objects, ACCORD/UML defines 
a second view of the class’ behavior, the triggering view. 

The triggering view is also defined through Statecharts. It defines the reactions of 
an object in one of the following cases: (a) when the object receives signals that are 
declared to be sensible, (b) when it reaches a particular state (specification of 
completion transitions), (c) when it detects a change in the system through a specific 
boolean expression (ChangeEvenf) and, (d) when a timer expires. The triggering 
view of an object focuses on “what a class must do”. 

The semantics of the triggering view transitions (called triggering-transitions) is 
mainly based on the principle of protocol-transitions i.e. in normal conditions a 
transition firing involves the execution of the method combined with an operation. 
As opposed to protocol-transitions, the action sequence of triggering-transitions has 
to be explicitly specified, and must be a single action (always of type Call Action). 
Moreover, the operation attached to this call action, has to belong to the described 
object interface. For the left-side part specification of a triggering-transition, all UML 
event types are allowed except for CallEvent. 

The syntax of a triggering-transition label is the following one: 



^ Well -Formedness Rules. 
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event-name comma-separated-parameter-list )* guard-condition ]* / 

operation-name(‘comma-separated-parameter-list) 

with the following WP^ applied in the context of a triggering-transition: 

[1] self, trigger. oclIsTypeOffSignalEvent) or self, trigger. oclIsTypeOffChangeEvent) 

or selftrigger.ocllsTypeOffTimeEvent) or self. trigger. oclIsTypeOf 
(CompletionEvent) 

[2] Transition currentTransition = self; selfejfect.ocllsTypeOffCallAction) and 
self. stateMachine. context y exists (feature^ currentTransition.effect.operation) 

Figure 3 illustrates the modelling of the triggering view of the SpeedRegulator 
class. On the structural specification of the class, we can see that the class is able to 
receive the OnOff signal. The designer may also specify the behavior of class 
instances after receiving a specific signal type (depending always on their current 
state). For example, in state Off and on receipt of the OnOff signal, the regulator 
object executes the method implementing its startRegulatingQ operation. 
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Figure 2: Triggering view of the global behavior of the SpeedRegulator class. 

The triggering view of an object’s behavior is not independent of its protocol 
view. There are two principal rules to respect during modeling of the triggering view 
in relation to an existing protocol view: 

1. the set of states that an object may have all along its life are fully defined in 
the specification of its protocol view. Therefore, the triggering view must not 
have states not defined within the protocol view. This means that a 
triggering-transition may exist only if it exists a protocol-transition with same 
source and target states and whose transitions involve the execution of the 
same operation; 

2. a triggering-transition from state Si to state S 2 with the label, evt 
[gJ/opeNameCcomma-separated-parameter-list% is “lawful”, if the 
corresponding protocol view defines a protocol-transition from state Si to 
state S 2 with the label opeNameCcomma-separated-parameter-list) [gj. 

The Statechart describing the class’ global behavior is constituted of states and 
protocol-transitions defined in its protocol view, and by triggering-transitions 
defined in its triggering view. The two views are not two different Statecharts 
composed one way or another to give the class’ global behavior. Actually, they are 
two abstractions of the class’ global behavior, each one focusing on a particular 
aspect of its behavior. Figure 3 depicts the global behavior of the SpeedRegulator 
class stemming from the previously defined protocol and triggering view definitions. 

The main interest of such behavioral specification structuring is to separate the 
responsibilities of the class’ behavior specification. Indeed, the protocol view 
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defining the “what can do” of a class is independent of its nature: standard or real- 
time active. While the triggering view defining the “what must do” of a class is only 
available for real-time active classes. Real-time active classes can also be easily 
reused, either as real-time active objects or as standard objects. In the latter case, the 
reactive aspects of the object behavior specified in its triggering view are inhibited. 

The behavioral specification described in the previous sections was limited to the 
description of the control logic of a class. In order to go further in the behavioral 
description, the user needs to model the algorithmic part of the application under 
development. Unlike all 00 approaches for real-time system development, the 
ACCORD/UML method proposes to avoid mixing the class’ control logic with its 
algorithmic specification. For that purpose, the transition firing of a class’ behavior, 
protocol-transition or triggering-transition, implies the execution of the method 
implementing the operation specified, implicitly or explicitly, in the label of the fired 
transition. Actually, there are no other actions specified in the transitions of a class’ 
behavior than a call to one operation of the class interface. All algorithmic 
descriptions, and therefore all executable code parts, are postponed within the 
methods implementing class’ operations. 




startRegulatingO ; 
OnOf f / StartRegulati ngO ; / 

J\ UnOff / stopRegulatmgO 
^ stopRegulatingQ ; 




/ maintainSpeedO ; 
maintainSpeedO ; 



Figure 3: Global behavior of the SpeedRegu/afor class. 



2.2. Operation behavior 



After having specified the class’ control logic, the designer models the behavior 
of the class’ operations. Up to now, in UML there are not well-defined ways to 
specify algorithms. Since Statecharts may be used to specify behavioral features such 
as methods i.e. implementation specification of class’ operations, ACCORD/UML 
elaborates the use of Statecharts for defining the behavioral specification of object’s 
operations. More specifically, methods are described through a specific use of 
Statechart decomposition^ in sequences of UML elementary actions: send action, call 
action, create action, terminate action, destroy action, return action and assignment 
action. 

The actual goal of this approach is not the description of complex computation 
algorithms. The designer has to keep in mind, when he or she models the algorithm 
Statechart of an operation, that the communication between the object and other real- 
time objects of the application must be specified in order to underline the 
synchronization points between all real-time objects of the application. A real-time 
object may communicate with other real-time objects in two ways: either by signal 
sending or by operation invocation (involving synchronization or not with other 
objects). 

Towards UML semantics for communication via signals, ACCORD/UML clarifies 
following things: the communication by signal sending is asynchronous and relies on 



^ The decomposition follows the action sequences as defined in UML Statechart package. 
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a broadcast mechanism. In other words, a signal communication involves one sender 
which does not know the receivers (classes being willing to receive a signal have 
declared it in their structural model) and N receivers which do not know the sender. 
Signal sending will generate to all potential receivers a signal event and the targets 
will react according to their triggering view specification by a method execution. 

The second possible communication type is operation call. This communication 
mode can be either asynchronous, synchronous or in ACCORD/UML delayed 
synchronous. The difference between the two last ones is that for a pure synchronous 
operation call, the caller waits as soon as it has sent the message while with a 
delayed synchronous operation call, the caller can continue its execution and waits 
for the return event only when it needs to use the result. For that last specific 
communication mode, the sender uses a special communication mechanism called 
reply box (Rbox). When an object sends a delayed synchronous message to another 
object, it attaches to its request an instance of the Rbox class. The called object will 
put its answer in this reply box, and the caller can at any time verify if the answer is 
ready and take the response as soon as it becomes available. This is a special kind of 
variable needed to receive a result from a called object. The Rbox notion is similar to 
the notion of future in ABCL [8] or continuation box (Cbox) in Act++ [9]. 




Figure 4: Behavior of the method implementing the maintainSpeedQ operation. 

The Statechart describing an operation algorithm is always starting with an initial 
state connected to a state named <operation-name>Begin and labeled 
start_<opeName>‘(’comma-separated-parameter-list‘)’. In fact, this label specifies 
an event, and particularly it is an internal event of the class’ behavior. This internal 
event may be generated under two conditions: either when a protocol-transition 
labeled <opeName>‘(’comma-separated-parameter-list‘)’ is fired or when a 
triggering-transition labeled Sig‘(’comma-separated-event-parameter-list‘)’ 
[g]/<opeName>‘(’comma-separated-parameter-list‘)” is activated. 

For example, when the regulator object reaches the On state, it fires automatically 
the completion transition labeled “/maintainSpeedO” which involves the generation 
of the internal event startjnaintainSpeedQ. This event is directed implicitly towards 
the state machine describing the behavior of the method implementing the 
maintainSpeedQ operation involving its execution. 

2.3. Summary of behavioral specification 
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To sununarize both previous sections, we will consider that the behavior of a 
class is a state machine owning different AND-states, one for the class’ behavior 
itself (protocol and triggering views that constitute the control logic specification as 
described in section 2.1) and one per operation behavioral specification (the 
algorithmic part specification as described in section 2.2). The result of such 
specification is given for the SpeedRegulator class in Figure 5. 

If the SpeedRegulator behavior had been modeled following usual 00 
approaches, it could give something like the Statechart presented in Figure 6. It is 
evident that the main drawback of such modeling approach consisting of a Statechart 
specifying all class’ behavioral aspects, is that it impairs the object feature of the 
class. In these conditions for example, it is very difficult to use inheritance features 
of the object paradigm, because as we can see in Figure 6, control logic as well as 
algorithmic specifications, that is usually contained inside the class’ operation body, 
are mixed in the same Statechart specification. 

Moreover, the implementation of an operation is also often dispatched on 
different transitions of the same Statechart. It is obvious that inheritance of an 
operation from a parent class and its behavioral refinement is not an easy task. The 
structured way of using UML Statecharts in order to specify the class’ behavior 
presented in section 2, allows a UML model to conserve its 00 features. For 
example, if a class inherits an operation, it has only to redefine the AND-state 
defining its behavior. 





Figure 5: ACCORD/UML specification of Figure 6: Usual 00 specification of 
SpeedRegulator behavior. SpeedRegulator behavior. 

Finally, regarding the proposed Statecharts structuring for describing the behavior 
of a class, it is important to clarify what the RTC^ assumption of UML Statecharts 
holds in our case too. Actually, this assumption is adapted to the real-time active 
object paradigm [10] in order to respect the concurrency policy we have adopted 
within our execution model, that is to say, the ‘1 writer and the N readers” 
concurrency policy. This mechanism is not presented in this paper due to its 
complexity and it is fully described in [11]. The application principle of the RTC 



^ Run To Completion. 
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assumption on the real-time active object paradigm relies on the fact that the RTC 
assumption is relaxed in some AND-state of the resulting class’ behavior, more 
particularly the AND-state describing the behavior of operations which are specified 
as writing operation in the class’ structural specification. 



3. Case study: UML modeling of a real time industrial 
application 

In the rest of the paper we will present how the ACCORD/UML methodology can be 
used for the design of a real world application. The example is borrowed from the 
domain of industrial networks and its goal is to extend the scenario described in 
Figure 4 and Figure 5 using fieldbus networks. 

From specification point of view, the models describing the application must 
reflect inherent concepts of fieldbuses like: maximum time between network events, 
periodic (data arrival at specific periods of time) and episodic (unpredictable but still 
bounded) traffic behavior, skews and jitter (the variation around the period of the 
arriving messages) produced during the data updates. 
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Figure 7 : Distributed case collaboration diagram. 

In Figure 7, a single fieldbus connecting two application entities running on 
different nodes (a producer and a consumer) is presented. We assume that the 
consumer application entity is regulator, an instance of the SpeedRegulator class. 
This object needs at some point of execution the current speed value (that is 
produced from the spMeter object). Assuming that the spMeter object (an instance of 
SpeedSensor class) is running on a different node, we have a slightly different model 
of the operation Statechart presented in Figure 4. In fact we have a distributed 
implementation of the maintainSpeedQ operation based on the distribution of the 
speed values by the fieldbus"^. 

Concerning the WorldFIP class behavior, it is reduced just to periodically updates 
- at precise intervals with a period Tcyc - where the local image of speed variable at 
regulator node is updated with the equivalent local image of the variable at the node 
of spMeter. The operation writeLoc() is used from the producer to update its local 
image of speed variable, while the readLoc() is used from the consumer to retrieve 
the current local image of the same distributed variable. In each speed value 
broadcasting, a WorldFIP object sends a sent() message to the producer and a 
receivedO message to the consumer of the speed value. Figure 8 and Figure 9 
illustrate the new behaviors specification for both SpeedRegulator and SpeedSensor 
classes with respect to the previous discussion. 



^ In the specific implementation, fieldbus is WorldFIP [12], and the behavior of its periodical 
services component is based on MPS (manufacturing Periodical/aperiodical Services) services 
of WorldFIP. 
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Figure 8: Distributed behavior 
SpeedRegulator. 



SpeedSensor_Behaviour 





1 sent / updateSpeedO; 


createO f „ 


{RTF=(dl(dl„paate,ms), 


• >[ On 


rt(rt„pdate, ms))} 


L 


I updateSpeedO 



A 



start_updateSpeed() 



I updateSpeedBegin 



WRF_WorldFIP 



/ getSpeedO; 



retum(getSpeedO) 

y / WorldFIP -> writeLoc(retumValue); 
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Figure 9: Distributed behavior 
SpeedSensor. 



The previous discussion dealt with period and delay timing constraints. But in the 
described distributed solution there is also a skew introduced, and that is between the 
time a speed value is captured by the measuring algorithm of spMeter and the time 
that value takes place in the regulation algorithm executing at the SpeedRegulation 
site. In a completely asynchronous operation of a consumer and a producer entity, 
this skew could be from zero to rtupdate time and if this skew defines a constraint, then 
a synchronization mechanism between the producer and consumer Statecharts should 
be implemented. This synchronization is achieved by the usage of the sentQ and 
receivedQ messages from the WorldFIP object along with a ready time constraint 
attached to the operation using the produced value in the consuming entity in order 
to delay the consumption up to the time of circulation. The actual timing and the 
corresponding constraints are shown in the sequence diagram of Figure 10. 




Figure 10: Distributed case sequence diagram. 
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4. Conclusions 

In the previous sections we presented an approach for modeling real time 
applications with UML. The ACCORD/UML design approach described does not 
extends the UML semantics in order to fulfill the design needs of real time 
applications; it proposes a design practice which relies on a specific use of UML 
Statecharts and the concept of UML tagged values. As it was exhibited in the case 
study section, the designer is able to adopt the design framework proposed by 
ACCORD/UML in order to describe real time aspects like periods, skews, jitter and 
deadlines. In this context, the high-level system models of an application borrowed 
from the domain of real time industrial networks was illustrated. 

The ACCORD/UML method is supported by Objecteering 4.30b^. The available 
configuration involves a set of modules allowing the generation of the real-time 
behavior of an object from its Statechart specification into C++ code. Moreover, a 
specialized C++ generator has also been developed in order to support the concepts 
of the extended active object. Finally, with respect to the underlying real time 
operating system (RTOS), two layers between the application and the RTOS, namely 
ACCORD/UML Kernel and ACCORD/UML virtual machine are available too; the 
first one implements mechanisms supporting active object semantics and above all 
the mechanism allowing to schedule application tasks respecting an EDF^ policy; the 
second one gives the application independence as far as the underlying RTOS is 
concerned. The latter, exists for Solaris and VxWorks5.2. 
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ANALOG/DIGITAL CO-DESIGN 



Frank Heuschen 
Klaus Waldschmidt 



Analog/Digital Co-Design is the central part of a design 
methodology for mixed signal systems. Additional degrees of 
freedom, provided by an enlarged design space, are mapped onto 
a space of possible implementations. An algorithmic analysis of 
this implementation space is used to find an optimized choice of 
architecture and an optimized partitioning. 

This paper presents the basic design flow of Analog/Digital Co- 
Design, a graph based model of implementation space and the 
application of decision theory and optimization algorithms to this 
model. 



1. Introduction 

Computer aided design of embedded systems is becoming more and more important 
for economic and scientific reasons. Because of the accelerated progress in digital 
technology, approaches in design automation are mainly focused on the digital core 
and its software. This has led to sophisticated solutions for Hardware/Software Co- 
Design [4,6]. 

Most embedded systems reside however in a continuous time and value 
environment, yet the aspects of analog signal preprocessing and signal conversion 
are usually neglected. Pre-configured converters are inserted at the system 
boundaries, often as an afterthought. 

Methodical approaches to mixed signal system design are few and existing design 
tools provide hardly automated design flows. The combined treatment of analog and 
digital architectures is very promising, because it opens up a considerable 
optimization potential. On the other hand the combined view presents an enlarged 
design space that is usually hard to explore at the designer ‘s own discretion. The 
main objective of design space exploration is to find a configuration that satisfies 
functional requirements without violations of nonfunctional constraints and with 
efficient usage of given resources. Commonly used iterative strategies lead the 
designer eventually to a suitable solution, but iterative refinement is time consuming 
and a huge amount of optimization potential is probably wasted. 
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Analog/Digital Co-Design as a methodology is an analogy to Hardware/Software 
Co-Design as well as an extension. We present one possible scenario of an 
automated Analog/Digital Co-Design process, a design flow for mixed signal 
systems with an emphasis on signal processing embedded systems. The central idea 
is to transform a given design space into a graph modeled implementation space. 
Implementation space can be defined as design space annotated with cost values. 
These values are derived from nonfunctional properties of different implementation 
possibilities. Any criterion that can be expressed with linear cost functions (chip 
area, power consumption or simply price) may be combined to a multi objective 
decision set. 

Cost values are obtained either by special estimation methods or from the 
manufacturer‘s cell libraries. The graph model allows us to analyze the 
implementation space with optimization algorithms leading us to a variety of valid 
solutions. Because architectures and partitions are not iteratively chosen, checking of 
constraint violation after each decision can be avoided. 

This paper presents a general description of the design flow in Analog/Digital Co- 
Design with an emphasis on the above mentioned implementation space exploration. 
The graph model that enables algorithmic optimization is introduced as embedded 
system graph. 

The application of simulated annealing is described as well, as appropriate multi 
objective decision methods. An example concludes the paper. 



2. Design Flow 

In this section, the design flow of analog/digital co-design is presented in more 
detail. 

On system level, the essential task of an embedded system design methodology is 
the mapping of the specification on the one hand onto the given design platform on 
the other hand. 

The specification contains behavioral descriptions of signal processing func- 
tionalities and/or algorithms. The design platform contains a set of implementable 
subsystems. These are filed either as IP-components, as system level models or as 
parametrizable templates. 

The first step is to analyze the specification and to identify functionally 
independent susbsystems. These subsystems are assigned with all library objects, 
that can perform that given functionality. These are the so called implementation 
possibilities. This way, we reduce the design platform to a given design space for the 
specified system. 

To hold the design space and to perform the appropriate mapping process, 
denoted by the swirl in figure 1, we propose a graph based system model, that 
models the specified system and the directed signal flow with a two dimensional 
graph. The graph is then expanded with a third dimension to contain the space of 
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possible implementations. The model is called systemgraph and will be explained in 
the next chapter in more detail. 



The second step, again symbolized by the swirl in figure 1, is the assessment of 
design space with nonfunctional properties or costs. The specification is processed 
now in more closely and functional demands are transformed to nonfunctional 
properties of the contained implementation possibilities. The values can be extracted 
from IP datasets or, which is our approach, the values can be calculated by specific 
estimation functions, that are based on the use of parametrizable templates. The 
values are annotated to the implementation possibilities and so the systemgraph is a 
model for the implementation space also. 
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Figure 1 . Design Flow 

After the complete build of implementation space, the analyzing step is 
performed, that produces the configuration, that is worth an implementation. The 
analyzing Step is a combination of combinatorial optimization and application of 
decision theory in case we look at more than one nonfunctional property. Which is 
probably always the case with electronic designs. 

Subsystems are represented by now with either RTL-descriptions for digital parts 
or net-lists for analog parts and converters. The subsequent synthesis steps are 
common practice and lie not within our research interest. 
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3. Implementation Space 

3. 1. Graph Model (informal description) 

The implementation space of a mixed signal system is modeled by a graph, 
consisting (as usual) of nodes and edges. This graph is named embedded 
systemgraph as mentioned above. Its properties are now explained in an informal 
way. For the formal definition please refer to [10] or [9]. 

Five different signal classes occur in a mixed signal system as shown in Table 1. 



Table 1 . Signal classes in a mixed signal system 



Si^al 


Time 


Value 


Bandwidth 


A 


continuous 


continuous 


unlimited 


B 


continuous 


continuous 


limited 


C 


discrete 


continuous 


limited 


D 


discrete 


discrete 


limited 


G 


continuous 


discrete 


limited 



A signal processing system can be structured into functionally independent 
subsystems. This is done on an abstract behavioral level of specification, regardless 
of the processed signal class. These functionally independent subsystems are 
modeled by the graph ‘s nodes. 

In addition, each closed path of directional signal flow has to start and to 
terminate with an environmental node. These nodes carry no functionality (NOP) but 
they produce an input signal and its specifications (source) or they consume and 
define an output signal of the system (sink). The system‘s signal flow is now 
depicted as a kind of block diagram (See Fig.5). 



Table 2. Possible functional nodes 
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Symbol 


Function 


NOP 


No operation, used for environmental nodes 


LP 


Lowpass 


HP 


Highpass 


BP 


Bandpass 


BS 


Bandstop 


INT 


Integration 


INTDT 


Integration, forced analog 


INDTN 


Integration, forced digital 


DIF 


Differentiation 


DIFDT 


Differentiation, forced analog 


DIFDN 


Differentiation, forced digital 


ADD 


Addition 


SUB 


Subtraction 


MUL 


Multiplication 


DIV 


Division 


ALGO 


Algorithmic specified subsystem 



In the current stage of implementation functionalities are identified as listed in 
(the expandable) Table 2. 





ALGO 




OUT 



Figure 2. Simple mixed signal system 

The third dimension of the implementation space is introduced by a vector of so 
called implementation possibilities, included in each functional node. The length of 
each vector corresponds to the number of functional equal implementations. This 
number varies with the different functions and may be limited by the designer 
according to corporation policy, customer demand etc. 

Each implementation possibility contains a set of variables for nonfunctional 
properties that belong to the multi objective decision set. In the current project stage 
only the nonfunctional properties of ideal circuits and implementations are 
considered (for instance the active circuit area without wiring, the theroretical power 
consumption or the function related delay). But the model contains already variables 
for the difference between an ideal and a real implementation as far as nonfunctional 
properties are concerned. This concerns parasitic effects or equalizing extensions as 
well as routed connections between functional subsystems. If these nonfunctional 
properties of non-ideal effects cannot be statically determined by estimation 
methods, their determination is subject to simulation, also defined as dynamic 
estimation. 

Environmental nodes contain only one implementation possibility which serves 
the sole purpose to specify the signals that are found at the system input(s) and the 
required signals at the system output(s). Another purpose of the environmental nodes 
is to propagate any knowledge from the specification into the implementation space 
model that cannot be expressed with signal classes or functions (nonfunctional 
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constraints for instance. An environmental node has no costs of nonfunctional 
properties. 

A system configuration is again a two-dimensional representation of the system, 
after selecting one implementation possibility in each node. Now, in a configuration, 
we face the problem of signal conversion, when subsequent node implementations 
process signals of different classes. If converter nodes were to be inserted at this 
point, two different configurations would hardly have the same number of nodes. 

Therefore each edge is always annotated with a converter shell regardless whether 
or not a converter is needed (Fig.3). This ensures a constant graph size and allows us 
to take the nonfunctional properties of wiring into consideration. 

IN IP ALGO OUT 



Figure 3. Converter shells in each edge 



The embedded systemgraph is made of four different objects as shown in Table 3. 



Table 3. Graph objects 


Object 


Symbol 


Use 


Source 


o- 


Environment specification for input signals 


Node 


□ 


Functionally independent subsystem, specified according to Table 2 


Edge 




Signal transport. Is annotated with an container for different 
converter implementation possibilities 


Sink 


-o 


Environment specification for output signals 



3.2 Analysis 

The evaluation of a configuration passes three stages: 

1. An implementation possibility is chosen in each block and its costs are 
determined. 
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2. The requirement of signal conversion is determined and zero costs (or 
equivalent to zero) are assigned to unneeded converters. 

3. For each type of conversion (A/D, B/D, D/B) a implementation possibility is 
chosen and the converter-node assigned with its costs. The number of 
converter implementation possibilities is usually limited by timing 
constraints, otherwise the choice of the converter architecture again is subject 
to an optimization process. 

Because of the complexity of the embedded system graph we chose to use 
heuristic metagraph algorithms for the implementation space analysis. Our first 
objective is an implementation of Simulated Annealing [1]. Afterwards Tabu Search 
[3] could be adapted alternatively. 

Each system configuration corresponds to a node of a hypergraph that models its 
neighbor relations. A neighbored configuration is a configuration that differs in a 
single implementation possibility in one given functional node. Simulated Annealing 
starts at a given evaluated configuration and jumps with a certain probability to a 
neighbored configuration analyzing its costs. The neighbored configuration is chosen 
randomly by modifying the active implementation possibility in a randomly chosen 
functional node. 

Depending on the start configuration and on the temperature of the algorithm, the 
behavior of the algorithm models different partitioning strategies. 

For example -find the partition with maximal digital domain- or -try to enlarge 
the analog domain- are common strategies in mixed signal design. 



3.3 Decision Theory 

From the decision theory point of view the analysis and optimization of an 
implementation space is a single actor/multi objective decision problem with 
certainty and without risk [7]. Any nonfunctional property that can be evaluated by 
linear cost functions may be included in the multi objective set. 

Nonfunctional properties and their associated constraints represent either 
contradictive or linked goals. Optimization concerning chip area and concerning 
delay is an example for contradictive goals. Improvement in one cost function leads 
to deterioration for the other. Chip area and power dissipation for example are linked 
goals. The optimal solution for such multi objective decision is usually defined as 
pareto optimum. Since Simulated Annealing randomly jumps from configuration to 
configuration, one cannot simultanously observe the development of each objective. 
Goal programming is used instead. First each optimization goal is optimized 
individually. Results are combined to the (only theoretically existing) goal vector. 

The optimization algorithm starts again but now the complete objective set is 
compared to the goal vector using an Ip metric. Metric elements may have additional 
weights to encourage optimization towards desired objectives. The pareto optimal 
solution can be approximated in this way, a solution should at least be element of the 
pareto set [8]. 
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3.4. Example 

Let us assume for the simple mixed signal system depicted in Fig.6 that an analog 
input signal shall be filtered with a low pass and then be processed by an algorithm. 
The algorithm‘s results shall leave the system as 16-Bit digital signal. 

IN LP MMO our 




IN 



IP 



Atm 



cm 



Figure 4. Configurations of the simple system 

It is easy to see that the converter node after the algorithm will always remain 
empty. But, as shown in Fig.4, there exist at least two different configurations for the 
signal preprocessing. 

1. The low pass is implemented with analog circuits, the first converter remains 
unneeded and the second converter is a B/D converter 

2. The first converter is an A/D converter, the low pass is implemented digital and 
the second converter is unneeded. 

Note that in reality there exist many more different configurations for this simple 
system. An analog low pass behavior can be implemented with at least ten different 
circuits, each with its own functional and nonfunctional properties. A digital filter 
can be implemented as one of four FIR circuits, one of eight HR circuits or as 
FIR/IIR software realization on a DSP or on a microprocessor [2], again each with its 
own functional and nonfunctional properties. 

Each configuration is multiplied by three according to the three conversion 
speeds: serial, weighing and parallel conversion. A digital FSMD implementation 
may have different space or timing requirements (nonfunctional properties) 
depending on scheduling and synthesis specifications. 

Even this simple embedded system has a considerable implementation space that 
can hardly be explored without computer aid. But even for such a simple embedded 
system the implementation space exploration may be worthwhile to find the 
configuration that satisfies functional requirements, does not violate nonfunctional 
constraints and makes efficient useage of given resources. 



4. Summary and Conclusion 

A top-down design method for mixed signal systems was introduced, with main 
focus on signal processing embedded systems. Key feature of this Analog/Digital 
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Co-Design is the common consideration of analog and digital implementations in a 
common design space. 

The design space is modeled with a formal graph representation. By annotation of 
costs for nonfunctional properties to each design alternative in choice the design 
space is transformed to a so called implementation space. The implementation space 
is modeled with the same graph representation, which is called systemgraph. The 
systemgraph is provided with cost values by means of estimation, simulation or 
synthesis. Different system configurations and their costs can be extracted from the 
graph. Together they represent a configuration hypergraph which can be optimized 
with probabilistic heuristics such as simulated annealing. 

An example has demonstrated that a large implementation space can be found for 
even the most simple embedded system. It could not be handled effectively with 
non-automated or with iteratively operating methods. 

We believe Analog/Digital Co-Design to be a promising approach to improve the 
quality and the level of automation in mixed-signal design. 



Acknowledgements 

This research project is partly funded by the Deutsche Forschungs Gemeinschaft 
under reference WA 357/14, where research is focused on the estimation of 
nonfunctional properties for signal processing circuits. 



References 

[1] E. Aarts and J. Korst. Simulated Annealing and Boltzmann Machines. Wiley & Sons 
Inc., 1988 

[2] S. A. Azizi. Entwurf und Realisiemng digitaler Filter. Oldenbourg Verlag, Munchen, 
Wien, fUnfte Auflage, 1990. 

[3] R. Battiti and G.Tecchiolli. The Reactive Tabu Search. In ORSA Journal on Computing, 
1993. 

[4] Klaus Buchenrieder. Hardware/Software Co-Design, An Annotated Bibliography. IT- 
Press, Chicago, 1995. 

[5] Christoph Grimm and Klaus Waldschmidt. „Repartitioning and technology mapping of 
electronic hybrid systems“. Design Automation and Test in Europe 98 (DATE), Paris, 
France, Febmary 1998 

[6] Sanjaya Kumar, James Aylor, Barry Johnson and W. Wulf. The Codesign of Embedded 
Systems. A Unified Hardware/Software Representation. Kluwer academic Publishers, 
Boston, Dordrecht, Lxindon 1996 

[7] Anatol Rapoport. Decision theory and decision behaviour: normative and descriptive 
approaches. Kluwer Academic Publishers, Dordrecht 1989. 

[8] R.L. Yu. Mutiple-Criteria Decision Making. Plenum Press, New York, 1985. 




32 



Architecture and Design of Distributed Embedded Systems 



[9] Frank Heuschen, Christoph Grimm, Klaus Waldschmidt. „Modellierung des 
Implementiemngsraumes im Analog/Digital Co-Design“. In Methoden und 
Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und 
Systemen, 3. GI/ITG/GMM- Workshop, Frankfurt, Deutschland, Februar/Marz 2000. 
Tagungsband: ISBN 3-8007-2524-X, VDE Verlag, Berlin, Offenbach 2000 

[10] Frank Heuschen, Holger Schmitt, Klaus Waldschmidt. „Formale Modelliemng eines 
gemischt analog/digitalen Systems“. In Methoden und Beschreibungssprachen zur 
Modelliemng und Verifikation von Schaltungen und Systemen, 2. GI/ITG/GMM- 
Workshop, Braunschweig, Deutschland, Febmar 1999. Tagungsband: ISBN 3-8265- 
4684-9, Shaker Verlag, Aachen 1999 




A DESIGN METHODOLOGY FOR 
EMBEDDED SYSTEMS BASED ON 
MULTIPLE PROCESSORS 

Luigi Carro, Fi^vio Wagner, M^rcio Kreutz, 

Marcio Oyamada 



This paper presents S^E^S, a CAD environment that allows 
specification, co-simulation and synthesis of embedded electronic 
systems, which are composed of a combination of analog parts, 
digital hardware, and software. S^E^S is based on a distributed, 
object-oriented system model, where abstract objects are initially 
used to express complex behavior and may be later refined into 
digital or analog hardware and software. The target architecture 
is a set of processors, each one with a possibly different 
architecture, covering microcontrollers, digital signal processing 
and RISC-like architectures. S^E^S offers a semi-automatic 
mechanism for selecting the processor that is best suited for 
implementing the function of each object in the system model. 
Besides performance, during the synthesis step S^E^S takes into 
account different design goals, like power consumption and area. 
This paper presents the processor selection methodology, as well 
as results from different applications. 



1. Introduction 

Embedded electronic systems contain a combination of software and hardware, both 
analog and digital. Although simple systems can be implemented with a single, off- 
the-shelf microcontroller, a digital signal processor or a conventional microprocessor 
and associated software, more complex systems that have critical requirements 
regarding aspects such as area, speed, and power consumption ask for a dedicated 
design. Various target architectures can be considered for matching different 
requirements. Solutions may include dedicated processors and/or ASICs, or even 
multi-processor platforms, combined with dedicated analog parts. 

Typical system examples are portable multimedia devices, industrial distributed 
controllers or vehicle supervision systems. All these systems demand digital signal 
processing, analog circuits to interface with the real world, radio frequency 
communication links and scalar processing for database lists, display and keyboard 
control. The current trend is to have a mix of behaviors in the same system-on-chip, 
requiring different design styles and processor architectures. 
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The design of such complex embedded systems encompasses a suite of different 
technologies, tools, and design styles. A complete design environment must consider 
system specification, partitioning among software, digital hardware and analog parts, 
and synthesis of software and hardware parts and interfaces. The design should 
ideally proceed from an initial, abstract specification, going through a sequence of 
successive refinements, until a final detailed solution is achieved. Intermediate, 
heterogeneous descriptions generated during this stepwise refinement must be 
validated, usually by co-simulation. 

In the S^E^S (Specification, Simulation, and Synthesis of Embedded Electronic 
Systems) environment, complex systems may be modeled not only at different 
abstraction levels, but also at different domains - abstract behavior (expressed by a 
high-level, object-oriented specification), digital hardware, analog hardware, and 
software. Co-simulation is supported by coupling different simulation engines, so 
that any heterogeneous model developed during a process of stepwise refinement is 
supported (Wag, 99). 

The S^E^S environment allows an easy exploration of the design space at a multi- 
processor level, selecting a combination of processors which best matches the design 
requirements, regarding not only speed, but focusing on area and power as well. This 
paper covers the processor selection features of S^E^S. 

The remainder of this paper is organized as follows: The next section presents a 
comparison of S^E^S with other design approaches. Section 3 introduces the 
methodology for processor selection. Section 4 discusses case studies that illustrate 
the capabilities of the design environment. Section 5 presents conclusions and future 
work. 



2. Related work 

The design of a simple embedded system can be solved with a single microprocessor 
or microcontroller and its associated software. In the case of complex designs, 
however, many issues regarding system specification, simulation and synthesis arise. 
The first hard task is to specify the desired behavior. The description of complex 
systems through a formal and abstract language is an open issue (Adam, 1996; Bot, 
1998). Recently, a first attempt to define a benchmark to system level specification 
was developed, but with no clear conclusion on which specification approach would 
lead to the best results (Neb, 1999; Mos, 1999). Two basic approaches have been 
proposed: the specification with a single language or model and the specification of a 
heterogeneous system, by combining different languages or models. 

Ptolemy (Kal, 1995), for instance, is an environment for simulation and 
prototyping of heterogeneous systems, using object-oriented technology. Ptolemy 
accomplishes multi-paradigm simulation by supporting internally different 
mechanisms, called domains. This way, objects in Ptolemy can be simulated using 
models such as Synchronous Data Flow, Dynamic Data Flow, discrete event, and 
analog domain (Pin, 1998). Another environment allowing the specification and 
simulation of heterogeneous systems is described by Jerraya and Ernst (Jer, 1999), 
where a backbone in the operating system implements communication among 
dedicated simulators that are needed for validating a heterogeneous model specified 
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with different languages. MCI (Hes, 1999) is also a generic mechanism for 
integrating different simulators for validating multi-language specifications. 

The description of complex systems through a single, abstract language has been 
also proposed. Some approaches that follow this strategy adopt an object-oriented 
specification to describe both software and hardware (Bot, 1998; Wol, 1997; Woo, 
1997; Aig, 1997). Most work on hardware and software co-design focus on the 
synthesis of a dedicated hardware or of a dedicated instruction set processor (Adam, 
1996; Kal, 1995; Wol, 1997; Mrv, 1998). In the Polis system, the model of execution 
is a set of commercial processors (Chi, 1994). However, all processors have the same 
characteristics. They are microcontrollers, targeted to embedded control, and not to 
data intensive applications. The synthesis style is based on software synthesis and 
performance estimation techniques. 

S^E^S combines the advantages of the multi-language and heterogeneous 
approach with the abstract, object-oriented specification (Wag, 1999). In S%^S we 
also aim at using as much software as possible, in order to reduce system cost and 
design time, and the target architecture also follows a multi-processor paradigm. 
Instead of a fixed target architecture devoted to ASICs or ASIPs, however, S^E^S 
synthesis is based on a library of processors, each with different characteristics, 
ranging from micro-controllers to digital signal processors. Therefore, differently 
from Polis, our target system can combine data dominant and control dominant 
behaviors, and the system tries to find the best processor (according to some design 
criteria) for each task. 



3. Object evaluation and processor selection 

The synthesis step in S^E^S is not targeted at a single, specific processor architecture. 
Instead, it allows easy design space exploration at the multi-processor level, whereby 
different processor architectures are analyzed, and those best matching the desired 
application requirements are selected and combined for design refinement. 
Moreover, in S^E^S we try to use available processors as much as possible, in order 
to reduce hardware costs and to enhance design time. Since nowadays one can find 
different microprocessors with different costs, architectures, and power consumption, 
using them in the design cycle generally turns out to be a flexible and low cost 
solution. We must also consider that designs are seldom started from scratch. Most 
companies try to reuse previously designed boards, multi-chip modules or IP 
processors for which a library of software modules is available. Furthermore, small 
and medium companies rarely have the capital to invest in high volume, single chip 
solutions. This way, the use of programmable processors is a natural choice to start a 
new product. 

After modeling the system in the object-oriented environment, the primary target 
to obtain a working system is to map all objects to one or more physical processors. 
This strategy assumes that, nowadays, there are different commercial processors 
available, with different cost/performance ratios, ranging from high performance 
DSPs to low power microcontrollers. 
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3. 1 Object evaluation 

In (Suz, 1996) the evaluation of software performance is based on a two-step 
procedure. At first, a high-level processor-independent representation is obtained, 
like a CDFG, and then the CDFG is translated into C code for the target processor. In 
S^E^S we also use an intermediate description of the code to be executed. Over a 
CDFG structure, an evaluation of the behavior of the object is obtained. Differently 
from (Suz, 1996), whose work is targeted to controllers, each software module in 
S E S is free from any previous template, so that each object has any possible 
behavior. This way, one must find out the typical behavior of the object code: a) 
control-dominated, as in FSMs for controllers; b) data-intensive computations, as in 
digital filters; or c) memory-intensive computations, as in list processing or data-base 
searching in a building entrance control, for example. 

Each object is targeted to a processor that best implements its behavior. The 
criteria for choosing the best processor are based on the processor characteristics to 
execute the desired code. For example, a DSP processor with a deep pipeline will 
pay a high branch penalty and is not adapted to a control-intensive application. On 
the other hand, if a low cost microcontroller can be used in a slow varying process 
that requires digital filtering at largely spaced samples, then this solution should also 
be given as an option for the designer. 

From the CDFG, a 3-address code for a virtual machine is generated. Actually, 
three different virtual machines are used, each one for a specific family of target 
architectures (microcontrollers, DSP processors, and RISC architectures). The 
purpose of this specialization is to enhance the predictability of software 
performance when executing on a certain class of processors. This way, the virtual 
machine for microcontrollers has only 2 working registers, and most operations use 
the internal accumulator. This way, most part of operating data must use the 
memory, slowing down the processor. This fact reflects actual characteristics of real 
microcontrollers. 

In the virtual machine targeted to DSP applications, memory references are used 
as registers, and special instructions like multiply-accumulate (MAC) are identified 
in the code. This tries to mimic the fact that DSP architectures are targeted to data- 
dominant applications, and so memory is accessed in a pipeline operation, with small 
timing penalty. On the other hand, control dominated programs often break the 
pipeline, incurring in a timing penalty. Finally, the RISC-like virtual machine has a 
large register set and operations are performed register to register. We assume a limit 
of 32 registers. This means that the RISC-like virtual machine will favor complex 
computations (even filters with small number of taps), up to the limit that more data 
than can be stored in the internal registers is required. 

The next step concerns object analysis. One tries to find which characteristic of 
the object is dominant: a) control-intensive - many control instructions and flow 
breaks; b) memory-intensive - list processing, digital filtering, much memory usage; 
or c) data-intensive - few memory access, most processing done with internal 
registers. Each of these characteristics will favor a different processor in the library. 

Let M be the total number of cycles used in memory access in the internal 3- 
address code, P the number of cycles to execute all data transformations (add, sub, 
and, mult, etc), and C the total number of cycles taken to test and branch (control 
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instructions). These numbers are obtained from the 3-address code and are thus 
specific for each virtual machine. The total number of operations in the application is 
thus P+M+C. Let APjc (Application Profile) be the relative importance of each 
behavior x in comparison with others, expressed as 

APP = P/(P + M + C), (1) 

APM= M/(P + M + C), (2) 

APC= C/(P + M + C). (3) 

Equations 1 to 3 show the relative importance of improving a given architecture 
to obtain the maximum gain while executing the modeled object. This way, if an 
application has a APC of 0.7, this means that it is control-dominated, and there is no 
point in using a DSP processor to implement it. 

A group of objects can also be mapped to a single processor so that the 
application may fit in a smaller number of processors. However, all actions that the 
design requires to run in parallel must be allocated to different processors. 

3.2 Processor analysis 

In order to implement the processor selection procedure, processors that are available 
in the library must be pre-characterized. Some of the processor characteristics that 
are analized and included in each virtual machine are: the size of binary word; types 
of instructions; memory operand accessing modes; number of busses to access 
memory; execution time of each instruction; type of memory; number of busses to 
access memory; number of registers; control instructions; use of pipeline and depth 
of eventual pipeline; and use of harvard architecture or not. 

These characteristics provide a high-level abstraction of a processor from a 
behavioral point of view. They can also be used to classify application-specific 
processors, like those devoted to DSP. 

We have characterized 3 different processors belonging to the three different 
architecture families, so that one could have an idea of different performance 
metrics. Processors described in the library are the 8051 microcontroller (Int, 1985), 
the C25 digital signal processor (Tex, 1997), and the Risco microprocessor, a 32-bit 
RISC-like microcontroller (Car, 1996). Table 1 shows some of the processor 
characteristics stored in the library. 

For example, since the C25 has a DSP architecture, memory accesses and 
computations take the same amount of cycles. This favors data-intensive 
applications. On the other hand, a RISC machine with many registers favors 
computations with few memory accesses. At the same time, the cost of a branch is 
higher in the C25, due to the effect of the possible pipeline flush. The added cost of 
the flush is considered in the table. In the C25, internal memory is considered as a 
register bank, due to its small access time and special indexing registers available in 
the architecture. 
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Table 1 - Partial processor characteristics 





#of 

registers 


Jump 

Cycles 


Mem access 
cycles 


#of 

Busses 


Operation 

cycles 


Power 

(mW) 


8051 


8 


3 


2 


1 


3 


40 


Risco 


32 


1+2 


2 


1 


1 


90 


C25 


544 


4+2 


1 (internal) 


2 


1 


210 



3.3 Processor selection 

For each processor in the library we must obtain its Performance Factor regarding 
the application. Performance Factors are given by the following equations: 

PFPi = Pi/(Pi + Mi + Q), (4) 

PFMi = Mi/(Pi+Mi+Q), (5) 

PFQ = Q/(Pi+Mi+Ci), (6) 

where the index i stands for a certain processor, and Pi, Mi and Ci are the relative 
costs of the processor instructions to execute data transformation, memory accesses 
and control operations, respectively. These costs depend on the processor 
characteristics, as introduced in the previous section. In this analysis, each 3-address 
code instruction of the application is assumed to generate a single instruction in the 
target processor. 

The simplest way to choose a processor would be to pick the one that, in the 
critical characteristic of the application, has the smallest Performance Factor. This 
would mean that, when executing, the processor would have a good performance 
while executing the critical part of the required code. This simplification could lead, 
however to a non-optimal solution. There might be applications where the difference 
between Performance Factors could be small, or with complementary characteristics 
(example: processor PI with PFPi=0.6, PFMi=0.1 and PFCi=0.3, against processor 
P2 with PFP2=0.6, PFM2=0.3 and PFC2=0.1). Our solution thus considers a right 
balance of the three factors. 

Consider a particular application, for which APP, APM and APC have been 
calculated. Consider also a processor whose Performance Factors for this application 
have been calculated. The Application Performance Distance (APD) for this pair 
{application x processor) is obtained by the following distance measure: 

APD = yliPFPi - APPf + (PFMi - APMf + {PFCi - APCf (7), 

where index i stands for a certain processor. For this analysis, it is assumed that 
each 3-address code instruction generates a single instruction in the target processor. 
Equation 7 shows how distant is the processor from the ideal virtual machine that 
can execute the code. The processor with the smallest distance will be probably best 
suited to execute the application, since it has a small overhead considering the three 
types of instructions. Moreover, to evaluate the execution time of the target 
processor, we take the instructions executed in the virtual machine, the clock 
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frequency of each processor and the number of cycles each processor takes to 
execute an instruction. This gives a rough estimate of the processor performance, 
enough to decide whether the processor is suited to work in the required timing or 
not. 

In the design of embedded systems, however, performance is not the only issue. 
For certain applications, there are many processors that could achieve the required 
performance. Other important aspects must come into play, like power dissipation 
and area. Moreover, an important point regarding reuse is the ability to answer the 
question "can a specific board or SOC be reused in this new application?". To 
answer this, the CAD system must evaluate all other aspects. In case more than one 
processor executes the object code in the required time, other aspects like power and 
area of the processor families may be compared, so that the best solution regarding 
all system aspects is achieved. In this work, area is evaluated in terms of FPGA cells 
used for the design of each processor core. 

4. Results 

In order to illustrate the concepts presented in this paper, we have applied the 
processor selection methodology to various examples, as shown in Table 2. In all 
cases, the whole system functionality has been implemented by a single object, so 
that a single processor has been selected to implement the function. Biquad is the 
classical biquad filter, while scrambler, descrambler, coder and decoder are part of a 
modem system, as well as echo-canceller. OCR is a neural network devoted to 
character recognition. The Podos system is an integrated circuit that measures the 
distance a person walks or runs. It is placed on the shoe and communicates with a 
display on the person’s wrist. The computation of the distance is based on the double 
integration of acceleration in two axes. 

Somewhat larger examples are the Crane Control and the Translating Pen. The 
first one has been proposed in (Neb, 1999) as an attempt of benchmarking in the area 
of system-level modeling and synthesis. The physical plant is composed of a crane 
with a load, moving along a track. The modeling of the physical system is done by a 
set of differential equations, which describe the behavior of the crane with a load and 
external forces being applied. The control algorithm of the Crane is implemented as 
a discrete computation of the state-variable method (Wag, 1999). 

The Translating Pen has an optical sensor that slides over characters, finding 
words that are translated in a dictionary. For this example, two objects were 
modeled, splitting the system into the optical character recognition part, which uses a 
neural network, and the list processing part, which uses a hash table to find the words 
in memory. 

Results concerning the above examples can be found in Table 2. Two selection 
mechanisms have been applied. Selection 1 is based on the calculation of APD, as 
explained in the previous section. In this mechanism, the maximum allowed values 
for execution time, power and area are used are requirements that must be met {no 
means no requirement). Selection 2 verifies which processors match the maximum 
required execution time and selects, among those, the processor with minimum 
power and area requirements. 
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It can be noticed that, for the biquad 1 and Podos examples, the Selection! 
procedure chooses the 8051 microcontroller, although the C25 is the best processor 
regarding speed and has the smallest APD and the Risco is the second best choice. 
But both the C25 and Risco are excluded due to user-defined limitations either in 
power or in area. A similar case happens with the biquad 2, where C25 would be the 
best choice regarding the APD, but it is excluded because of area requirements and 
Risco is chosen. 

When the Selection 2 procedure is applied, in most cases the 8051 is chosen, 
because it matches the maximum execution time and has less power and area 
requirements than the C25 and Risco. The only exceptions occur with the filter and 
echo canceller examples, where the 8051 is excluded because of time limitations. In 
both cases Risco is then chosen, because it has less power and area requirements than 
the C25. 

As it can be seen, S^E^S can not only guide the design process, but it can also help 
the designer in the specification phase for buying an IP or in the development of a 
new architecture. After processor selection, the C code for the dedicated processor is 
generated, and a dedicated commercial compiler is used to obtain the final object 
code. 

Table 2 ■ Results 



Application 


APD 


APD 


APD 


max. 


wgm 


max. 


selection 


selection 




8051 


Risco 


C25 


time 


H9 


area 


1 


2 


Biquadl(l) 


0.91 


0.55 


0.00 


1 


50 


no 


8051 


8051 


Biquad2 (2) 


0.91 


0.55 


0.00 


1 


no 


1050 


Risco 


8051 




0.74 


0.42 


0.37 


0,02 


no 


no 


C25 


Risco 


Podos (1) 


0.64 


0.31 


0.35 


10 


50 


no 


8051 


8051 




0.37 


0.21 


0.43 


2 


no 


no 


Risco 


8051 




0.67 


0.08 


0.42 


1 


no 


no 


Risco 


8051 




0.16 


0.49 


0.41 


1 


no 


no 


C25 


8051 


Echo-canc.(3) 


0.76 


0.12 


0.46 


0,02 


no 


no 


Risco 


Risco 


Scrambler 


0.69 


0.16 


0.29 


0,0125 


no 


no 


Risco 


8051 


Descrambler 


0.78 


0.14 


0.26 


0,0125 


no 


no 


Risco 


8051 


Coder 


0.77 


0.26 


0.23 


0,02 


no 


no 


C25 


8051 


Decoder 


0.37 


0.33 


0.39 


0,02 


no 


no 


Risco 


8051 



Time is given in ms\ power is given in mW\ area is given in LE’s for an FPGA. 

(1) C25 and Risco do not match power. 

(2) C25 does not match area. 

(3) 8051 does not match time. 



5. Conclusions and future work 

An integrated CAD environment for embedded systems must consider important 
aspects such as system specification, validation, and synthesis. Various different 
approaches have been proposed in the literature to cope with these issues. Most 
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environments have a fixed target architecture, consisting of a single processor and 
maybe some peripheral ASICs. These synthesis approaches concentrate on the task 
of partitioning system functions among hardware and software. S^E^S, in turn, 
performs a synthesis that is based on a library of processors, ranging from 
microcontrollers to ASIPs and DSPs. Each processor is characterized by a set of 
parameters, and the environment tries to match each object of the application 
(considering the application profile) to the most adequate processor. The final 
architecture is therefore a multi-processor platform. 

Future work includes: the expansion of the processor library; the development of 
larger examples that really require multi-processor platforms; the generalization of 
the co-simulation mechanism in order to allow the integration of other specification 
languages, as in (Hes, 1999); and the development of algorithms to explore grouping 
of objects into a single processor, considering various quality metrics, as in (Dia, 
1999). Other very important topics in the context of distributed embedded systems 
must still be considered in the future: the synthesis of the communication between 
processors and the synthesis of the operating system for objects executing various 
functions. Following an approach similar to that used for the processor selection, we 
intend to develop a communication synthesis mechanism based on a library of 
protocols, as in (Hes, 1999). 
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AN ARCHITECTURE FOR 
RELIABLE DISTRIBUTED 
COMPUTER-CONTROLLED 

SYSTEMS 

Luis Miguel Pinho, Francisco Vasques 



In Distributed Computer-Controlled Systems (DCCS), both real- 
time and reliability requirements are of major concern. 
Architectures for DCCS must be designed considering the 
integration of processing nodes and the underlying 
communication infrastructure. Such integration must be provided 
by appropriate software support services. 

In this paper, an architecture for DCCS is presented, its structure 
is outlined, and the services provided by the support software are 
presented. These are considered in order to guarantee the real- 
time and reliability requirements placed by current and future 
systems. 



1. Introduction 

Distributed Computer-Controlled Systems (DCCS) are increasingly used in the 
industrial environment, where computer systems are expected to perform correctly, 
even in the presence of faults. The traditional approach to guarantee the 
dependability requirements of DCCS is to replicate some of its components, in order 
to tolerate individual faults. However, when replicated components are used, there is 
the need for reliable and time-bounded communication services. Messages must be 
correctly and orderly delivered according to their timing requirements. Therefore, the 
full integration of the communication infrastructure with the processing nodes is 
required in order to obtain the desired level of confidence in the system. 

Using COTS as the systems’ building blocks provides a cost-effective solution, 
and at the same time allows for an easy upgrade and maintenance of the system. 
However, as COTS hardware and software does not usually provide the confidence 
level required by reliable real-time applications, reliability requirements must be 
guaranteed by a software-based fault-tolerance approach. 

The use of COTS components usually implies the use of fail-uncontrolled 
components. It is not possible to guarantee fail-silent properties for off-the-shelf 
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hardware and/or software, as these components usually do not have the required self- 
checking mechanisms for detecting faults. Fail-uncontrolled components require the 
use of active replication (Powell, 1994), since masking faults in one component 
requires the replication of such component in other nodes. Consequently, a COTS- 
based system must be able to manage by its own such component replication. 

The proposed architecture is targeted to provide a guaranteed (timely and reliable) 
execution environment to hard real-time applications. In addition, it is also targeted 
to provide the adequate quality of service to soft real-time applications, which must 
not interfere with the behaviour of the hard real-time applications. It is not targeted 
to safety-critical systems, as these systems require a greater level of dependability 
and a more restricted set of failure assumptions (Laprie, 1992). 



2. System Architecture 

The system architecture (Figure 1) is based on the use of a set of processing nodes, 
where distributed hard real-time applications may execute. To ensure the desired 
level of reliability to hard real-time applications, specific components of these 
applications may be replicated. 




Figure 1 . System architecture. 

Nodes are interconnected by a real-time network, which provides the 
communication infrastructure for the hard real-time applications (interconnecting 
controllers, sensors and actuators). This real-time network is also intended to support 
the replica management mechanisms. At the above level, as there is the need of 
interconnection with the upper levels of the DCCS (e.g. for remote access, remote 
supervision and/or remote management), there is a general-purpose network 
interconnecting some of the DCCS nodes. 

2. 1 Node Architecture 

Each node (Figure 2) integrates both a hard real-time subsystem (HRTS) and a soft 
real-time subsystem (SRTS). The goal of the HRTS is to provide a framework to 
support reliable hard real-time applications, which are at the core of the system. The 
SRTS provides the interface for the remote supervision management of the DCCS. 
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Figure 2. Node structure. 

The communication mechanisms between both subsystems must guarantee that 
failures in the SRTS (less reliable) do not interfere with the HRTS (concerning its 
timing and reliability requirements). Therefore, mechanisms for memory partitioning 
must be provided, and the communication mechanisms must guarantee the integrity 
of data transferred from the SRTS to the HRTS, by upgrading its confidence level. 

The HRTS is responsible for providing a framework for reliable execution of hard 
real-time applications. Hence, applications have guaranteed execution resources, 
including processing power, memory and communication support. This claims for a 
separated real-time communication network for the HRTS, where messages sent 
from one node to another are received and processed in a bounded time interval. The 
HRTS Support Services are responsible for the real-time communication 
management and also provide a transparent framework for the replication of 
application components. 

The SRTS provides a set of services to support the supervision and management 
level of the DCCS. It may provide CORBAMTTP servers, which can be accessed 
using supervision and management tools. At this system level, flexibility is a major 
goal, since new services can be created as the system is upgraded. 

2.2. Communication Infrastructure 

Current work is being performed in order to assess the suitability of the Controller 
Area Network (CAN) (ISO, 1993) to act as the real-time network. Although being 
originally designed for use within road vehicles, CAN is also being considered for 
the automated manufacturing and distributed process control environments (Zuberi 
and Shin, 1997). Several studies on how to guarantee the real-time requirements of 
messages in CAN networks are available (e.g. (Tindell et al., 1995)). Nevertheless, 
the continuity of service is not fully guaranteed, since it may be disturbed by 
temporary periods of network inaccessibility (periods during which stations cannot 
communicate with each other, due to the existence of on-going error detection and 
recovery mechanisms). A study of the inaccessibility characteristics of CAN 
networks has been presented at (Rufino and Verissimo, 1995), identifying the 
duration of its error detection and recovery periods. The integration of the 
inaccessibility studies with the timing analysis (Pinho et al., 2000a) indicate that 
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CAN presents some problems, as it is not able to provide different integrity levels to 
the supported applications. However, it is also perceived that, under an appropriate 
set of fault assumptions, it can be used to support reliable real-time DCCS (Pinho et 
al., 2000a). 



3. Hard Real-Time Subsystem 

The HRTS allows real-time applications to be distributed over the nodes of the 
system (Figure 3). It is based on the software integration of COTS components, that 
is, “replication handled entirely by software using off-the-shelf hardware” 
(Guerraoui and Schiper, 1997), rather than building software on top of specialised 
hardware. 
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Figure 3. HRTS structure. 

The HRTS provides a framework to support hard real-time applications, where 
timing requirements are guaranteed through the use of current off-line schedulability 
analysis techniques (Response-Time Analysis (Audsley et al., 1993)). A multitasking 
environment is provided to support real-time applications, with services for task 
communication and synchronisation (including distribution). 

One hard real-time application is constituted by several tasks (processing units), 
which combined together perform the desired service. In Figure 4, a hard real-time 
application is divided in four tasks, which execute in different nodes of the HRTS. 
Each node has its own (non-distributed) COTS kernel and hardware, which provides 
the desired real-time multitasking support. An additional advantage of using both a 
COTS kernel and hardware is that it provides means for the easy upgradability and 
portability of the system. 

The goal of the HRTS support software (Figure 4) is to provide the distribution 
support (including both the application distribution and the replication management) 
to hard real-time applications. This module manages the communication between 
different nodes, resulting from the replica management, the application distribution 
and the interface with the controlled environment. 

The HRTS supports the active replication of software with dissimilar task sets in 
each node. The reason for allowing dissimilar task sets is twofold. By providing 
different execution environments in each node, the tolerance to design faults is 
increased, as the probability of the same fault occurring in more than one node 
decreases. At the same time, the architecture flexibility is increased, since nodes are 
not just duplicates, allowing for a more flexible design of real-time applications. 
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Figure 4. Distributed Hard Real-Time Application. 

However, multitasking applications with differentiated execution environments 
are likely to result in replicated components with non-deterministic executions. 
Hence, the HRTS support software provides mechanisms to guarantee deterministic 
execution. As these mechanisms need to be time-efficient, they are not based in 
replica co-ordination but in the concept of timed messages (Poledna et al., 2000). 

A layered approach is provided to the HRTS, in order to simplify the system 
development. The HRTS support software (Figure 5) EINBETTENcomprises two 
layers: 

1. The Communication Manager layer, which is responsible for the reliable and 
timely transfer of real-time data; 

2. The Replica Manager layer, which is responsible for the transparent 
management of the replicated components, in order to not burden the 
programmer with explicitly programming of replicate managing mechanisms. 
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Figure 5. Hard Real-Time Subsystem layers. 

3.1. Scheduling model 

The HRTS is intended to support one or more hard real-time applications. Each 
application consists of a set of related tasks (Xi ... Xn), being each task a single 
processing unit. Tasks from the same application can be allocated to different nodes, 
(distributed environment). In order to use the well-known Response Time Analysis 
(Audsley et al, 1993), each task is released only by one invocation event, but can be 
released an unbounded number of times. A periodic task is released by the runtime 
(temporal invocation), while a sporadic task can be released either by another task or 
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by the environment. After being released, a task cannot suspend itself or be blocked 
while accessing remote data (external blocking). 

Tasks are allowed to communicate with each other either through shared data or 
by release event objects. Shared data objects are used for asynchronous data 
communication between tasks, while release event objects are used for the release of 
sporadic tasks. Tasks are designed as small processing units, which, in each 
invocation, read inputs; carry out the processing; and output the results. The goal is 
to minimise task interaction, in order to improve the schedulability analysis and 
increase the system’s efficiency. 

As there is no synchronous interaction between tasks, the release of a task cannot 
be directly made by other tasks. Thus, sporadic tasks are suspended waiting in a 
release event object, which is triggered by waking tasks, whereas the runtime 
executive triggers periodic tasks. Internal blocking due to task communication can be 
bounded and off-line analysed using Priority Ceiling Protocols (Sha et al., 1990). 

3.2. Replication Model 

As there is the target of reliability through replication, it is important to devise which 
is the replication unit (that is, the smaller replication entity). Therefore, the notion of 
component is introduced. Applications are divided in components, each one being a 
set of tasks and resources that interact to perform a common job. The component can 
include tasks and resources from several nodes, or it can be located in just one node. 
In each node, several components may coexist. As an example. Figure 6 shows a 
real-time application with 4 tasks (Xi, X2, T3 and T4) divided in two different 
components. Component Ci encompasses tasks Xi (node 1) and X2 (node 2). Its 
replica encompasses tasks xf (node 3) and X2’ (node 5). Component C2 encompasses 
tasks X3 (node 2) and X4 (node 3), while its replica encompasses tasks X3’ (node 4) and 
X4’ (node 5). 




Figure 6. Replicated Hard real-time application. 

A similar concept to the component can be found in the notion of “capsules” of 
the Delta-4 architecture (Powell, 1991). As the component, a Delta-4 “capsule” is the 
unit of replication, embodying a set of tasks (referred as threads) and objects. 
However, a “capsule” has its own thread scheduling and separated memory space, 
and is also the unit of distribution. Thus, the Delta-4 concept of “capsule” is more 
related to Unix processes, whilst the presented component is a more lightweight 
concept, which is used to structure replication units. 
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By creating components, it is possible to define the replication degree of specific 
parts of the application, according to its desired reliability level and the reliability of 
its components. The degree of replication of a component is referred as n-replicated 
component. In Figure 6 , both components Ci and C 2 are 2-replicated components. 

By replicating components, efficiency decreases as the number of tasks and 
messages increases and there is the need for agreement on the output of 
computations. Hence, it is possible to trade reliability for efficiency and vice-versa. 
Although efficiency should not be regarded as the goal of a reliable system, it can be 
increased by means of decreasing the degree of redundancy of more reliable 
components (if this assumption can be guaranteed). 

The component is the fault-containment unit. Faults in one task may produce the 
failure of the component. However, if a replica of the component fails, the 
application will not fail, since the output consolidation will mask the failed replica. 
Therefore, in the model of replication, the outputs of internal tasks (within a 
component) do not need to be agreed. The output consolidation is only needed when 
results are made available to other components or to the controlled system. As can be 
seen in Figure 7, several possibilities exist for the configuration of an application. 
The first part of the Figure shows the same configuration presented in Figure 6 , 
while in the second part there is a solution where the application is divided in three 
components and only component C 2 is replicated. The double arrows indicate 
communication between different components, thus communication needing 
consolidated data. 




Figure 7. Examples of application configuration. 

Note that the second solution is more efficient, as there are only two more tasks 
than the strictly needed by the application. However, the reliability assumption of 
both the sensor and components Ci and C 3 (and the nodes where they execute) must 
be higher than in the previous solution, as they are not replicated. 

There is the need to guarantee that replicas execute deterministically, that is, 
replicated tasks execute with the same data and timing-related decisions are the same 
in each replica. This determinism can be achieved restricting the application from 
using timing non-deterministic mechanisms. However, the use of multitasking would 
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not be possible, since task synchronisation and communication mechanisms 
inherently lead to timing non-determinism. The use of timed messages (Poledna et 
al., 2000) allows a restricted model of multitasking to be used and at the same time 
eliminates the need for agreement between the internal tasks of each component. 
With timed messages, agreement is only needed to guarantee that all replicated 
components work with the same input values and that they all vote on the final 
output. The use of timed messages implies the use of appropriate clock 
synchronisation algorithms, since there is the need of clocks with a bounded 
difference. 

3.3. HRTS Replica Manager 

The goal of the Replica Manager layer is to provide hard real-time applications with 
the set of resources required for communication between distributed tasks and 
between replicated components. In the HRTS, tasks communicate with each other by 
using shared data and the release of event objects. However, these mechanisms must 
be different when they are used for intra-component communication or for inter- 
component communication. In addition, there is also the difference when 
communication is due to distribution or it is due to the replication mechanisms. 

If precedence relations exist between tasks, the communication mechanisms can 
be simplified, since these precedence relations guarantee deterministic execution 
(Wellings et al., 1998). If the receiving task is sporadic and is released by a sending 
task, it is guaranteed that, in all replicated components, the replicas of the task will 
execute with the same data. The same reasoning can be applied when the receiving 
task is periodic with a period related to the period of the sender task. 

Although the goal of the replica manager is to transparently manage distribution 
and replication, it is considered that a completely transparent use of these 
mechanisms may introduce unnecessary overheads, since there are some special 
cases that must be considered. Therefore, the application programmer (transparent 
approach) does not consider the use of components at the design phase. Later, in a 
configuration phase, the system engineer configures the components and its 
replication level and allocates the different tasks in the distributed system. In this 
phase, the communication streams that need timed messages are identified. 
Guidelines for splitting the application in components are to be developed to ease the 
job of engineers. 

3.4. HRTS Communication Manager 

The Communication Manager layer is responsible for providing a reliable and timely 
transfer of real-time data. The group communication abstraction is used as the 
framework for reliable communication and to support the replica management 
(Powell, 1994). In the replication model, a set of replicas from the same component 
is referred as a group. The Communication Manager must provide the following set 
of mechanisms: 

1. 1-to-many communication, when a task of a non-replicated component 
wishes to disseminate its result to the n input tasks of a n-replicated 
component (reliable multicast protocol). 
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2. Many-to-1 communication, when an input task of a non-replicated component 
receives inputs from a n-replicated component (consensus algorithm). 

3. Many-to-many communication, when a group of n output tasks of a 
n-replicated component disseminates its results to the n input tasks of a n- 
replicated component (interactive consistency (Pease et al., 1980) algorithm). 

4. 1-to-l communication, for communication between tasks of the same 
component (intra-component communication) or between the output task of a 
non-replicated component and the input task of a non-replicated component 
(no need for specific algorithms). 

The suitability of the CAN protocol for the communication infrastructure is being 
studied (Pinho et al., 2000a) (Pinho et al., 2000b). Although current results indicate 
that CAN presents some problems as it is not resilient to station errors, it is perceived 
that, with the appropriate set of fault assumptions, it can be used as the 
communication infrastructure. 

3.5. Interconnection with the outside world 

The interconnection of the HRTS with the SRTS must provide mechanisms for 
transfer of information between both subsystems. Communication from the HRTS to 
the SRTS does not present any major problem, since it is assumed that this 
information has a higher reliability level. However, if the output to the SRTS comes 
from replicated components, appropriate agreement must be performed. Conversely, 
the reliability of the data arriving from the SRTS must be increased, in order to 
prevent the introduction of erroneous values. Also, if the data is to be provided to 
replicated components, reliable communication algorithms must be used to 
disseminate this data. 

Interconnection with the controlled system is performed through the use of 
sensors and actuators. Sensor values can be treated as the output of non-replicated 
components and its dissemination must be performed accordingly to the desired 
reliability. The time at which the value is valid must be agreed upon. Output to 
actuators must also be agreed upon between different replicas. Such agreement may 
be made either in the computational system or the actuators may perform themselves 
this agreement, by mechanical or electronic voting on the result. 



4. Conclusions 

In this paper, an architecture for Distributed Computer-Controlled Systems (DCCS) 
is presented. It is targeted to provide a guaranteed (timely and reliable) execution 
environment to current and future systems. 

The structure of the architecture is presented, together with the guidelines used in 
its design, and its scheduling and replication models. The support software, which 
provides distribution support (including both the application distribution itself and 
the replication management) to hard real-time applications, is also discussed. 
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PLATFORM FOR MULTIPROCESSOR 
SYSTEM-ON-CHIP DESIGN 
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1. Introduction 

This chapter presents a generic architecture model for multiprocessor embedded 
system-on-chip design. The use of this model as a template in a system design 
environment allows for efficient generation of multiprocessor architectures. The key 
characteristics of this model are its great modularity, flexibility and scalability which 
make it reusable for a large class of application. In addition, it allows for reusability 
of pre-designed blocks and automatic architecture generation. This chapter focuses 
on the definition of the architecture model. The feasibility and effectiveness of this 
architecture model are illustrated by a significant demonstration example. 

Current trends in system design methods are towards codesign of mixed 
hardware/software system targeting multiprocessor system-on-chip. This direction is 
imposed by the ever increasing system on chip complexity. Additionally time-to- 
market pressure requires to shorten design time through raising the abstraction level 
of the specification entry. One of the most important issues in multiprocessor design 
is the target architecture. The rigidity of the target architecture may lead for a very 
restraint application field. So in order to build an efficient multiprocessor design 
flow able to generate complex system architecture from high level specification we 
need a generic architecture model that can be used as a template by the synthesis 
process. Modularity, flexibility and scalability are required to have an efficient 
multiprocessor design flow. What we mean by efficient here is, first of all, feasible 
and applicable to large application fields. Modularity is needed to master complexity. 
It allows the design of the different modules separately and provides an assembling 
scheme. The most common way to achieve modularity is to separate the inter sub- 
system communication from the behavior when partitioning a system. Modularity 
allows for reuse of existing modules. Flexibility is required in order to avoid early 
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decisions. It allows the designer to decide quite late in the design process on which 
technology will be used for the design of each module. When combined with 
modularity, flexibility allows to change the implementation for a given module at 
any stage of the design process. For instance a software module may be converted 
into a hardware module for performance reasons. Scalability allows to adapt the 
same architecture model for applications of different complexity scale. For instance 
increasing the number of processors or communication buses. 



1 1 Related work 

Most of the existing works target monoprocessor architectures, and the most used 
model in this class is the single CPU, single ASIC target architecture. Even thought 
this architecture is a special and limited example of a distributed system, it is 
relevant in the area of embedded systems [12]. In this class of work we can cite 
LYCOS [11], COSYMA [13], CoWare [16] and PMOSS [6]. Another design 
systems, such as Vulcan [8], TOSCA [1] and COBRA [10], can support more than 
one ASIC. Several research groups tried to target multiprocessor architectures; we 
can cite POLIS [2], Chinook [4], SpecSyn [7] and the work led by Wolf and Ti-Yen 
[17]. In the POLIS [2] system, the target architecture is a system consisting of 
general-purpose processors combined with a few ASICs and possible other 
components such as DSPs. The target architecture in the SpecSyn [7] system is a 
heterogeneous multiprocessor with any number of processors, coprocessors, ASIP or 
FPGA, communicating through multiple buses. Besides these academic research 
projects, there were also several industrial trails of open standards and design 
methodologies [3][15][18][19][20][21] that try to deal with the more and more 
complex system on chip designs. However, we believe that in all above works, target 
architectures still lack generic aspects and thus only tackle a restricted application 
field. In fact, most of the above mentioned systems restrict the kind of components 
used and/or the communication network to few proprietary and/or specific models 
designed to be plugged together. 



1.2. Contribution 

Our long-term objective is the definition of an efficient multiprocessor SoC design 
environment applicable to large application fields and able to generate 
multiprocessor architectures. The main contribution here is the definition of a 
modular, flexible, and scalable architecture model -MFSAM- that may be used for 
an efficient multiprocessor SoC design flow and handles a large class of applications. 
The main futures of this model are: 

1. As stated above, the model allows for modularity, flexibility and scalability. 

2. The model allows for automatic generation of architectures. This point needs 
some harmonization with the input model and the targeting algorithms. 

The model is made of a set of processors communicating through a 
communication network (Figure 1). 
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Figure 1 . Generic architectural model 

A processor may be hardware, software or an IP. The communication network 
may be of any complexity ; it may be made of a single bus or a network with complex 
protocols. Processors are linked to the common network through communication 
interfaces. The scalability of this architecture depends on the scalability of the 
chosen communication network. Modularity is ensured by the use of specific 
interfaces to link processors to the communication network. This gives the 
possibility of designing each part of the application separately; even we can include 
pre-designed modules (IPs). The generic assembling scheme of our model increases 
largely its modularity. This separation between processor and communication 
network through specific interfaces also provides high flexibility. In fact, if we 
change the technology implementing a given module (processor) the only part of the 
architecture that needs to be changed is the interface of the corresponding module. 
This chapter focuses on the definition of the architecture model and its use within an 
architecture generation flow. 



2. Architecture in System Design 

In this section, we give a brief analysis of system architectures encountered in 
electronic systems [5] [14]. Both embedded system architectures and general-purpose 
computer architectures are made of three elements: basic components, 

communication network, and organization scheme. 



2.1 Basic components 

It consists of basic elements that can be assembled for realizing an architecture 
implementing a given application. In a hardware/software codesign context, system 
design components may be classified in three categories: software, hardware, and 
communication components. 
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2.2. Communication network 

The communication network constitutes the hardware links that support the 
communication primitives between components. The direct way to connect the 
components of a system is to have a dedicated communication link between every 
two communicating components. Between this fully connected network and the bus, 
there are a wide range of interconnection networks. These networks make a major 
factor to differentiate modem multiprocessor architectures. Interconnection networks 
are built up of switching elements and a topology. The topology is the pattern in 
which the individual switches are connected to other components, like processors, 
memories, I/O devices, and other switches. 



2.3. Organization scheme 

It is defined as the method of composition or assembling of the basic components to 
construct the system architecture. A system architecture can vary from a simple 
controller to a massively parallel machine. Here what is interesting is the roles 
played by the different basic elements at the global control level of the system. Thus, 
we can classify system architectures into two categories: monoprocessor 
architectures and multiprocessor architectures. In addition, communication network 
and programming model play an essential role in the classification of system 
architectures. A monoprocessor architecture consists of one CPU and one or more 
ASICs. This scheme follows a master-slave synchronization scheme where the top 
controller acts as a main processor in charge of coordinating the activities of the 
other components which are acting as co-processors: although very useful in several 
application domains, the single processor architecture can only provide a restricted 
performance capability because of the lack of true parallelism. 

A multiprocessor architecture allows more flexibility and improved performances 
thanks to the distribution of computation among processors. However, it is much 
more difficult to handle due the parallelism. 



3. MFS Architecture Model 

The new architecture model that we propose here allows for modularity, flexibility 
and scalability. This model is based on the analysis results of existing architectures 
in system design (done in section II). We chose from that huge design space the most 
appropriate elements that fit our needs. These needs are -in addition to modularity, 
flexibility and scalability- the adaptability for multiprocessor SoC design and the 
possibility of an automatic generation of the final architecture. Also, we are mostly 
interested by architectures suitable for embedded systems on chip which is very 
relevant in many potential application fields. In this section, we will develop the new 
architecture model we propose. 
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3.1. Components 

The components of our architecture model belong to the three essential categories: 
software, hardware, and communication components. It consists of software 
processors, hardware blocks, memories, and communication interfaces. 

3.1.1. Software processors. In this category we can include off-the-shelf 
microprocessors, microcontrollers, DSPs, and application specific processors (i.e. 
ASIPs). The attributes requested by our architecture model for this category of 
components can be verified by most of them. The processor should be able to 
communicate through an external memory bus (synchronous or asynchronous). It 
should also be able to handle external interrupts. Additional I/O ports and internal 
peripherals may be useful but not necessary. The processor-memory interface and 
interrupts are essential for building the processor-network communication interface. 

3.1.2. Hardware blocks. To add a pre-designed hardware block (i.e. IP) following 
our architecture model it must be provided with its communication interface adapted 
the to the network. The reason for that is that there is no normalized bus interface for 
such components. 

3.1.3. Memory. Memories are essential components in system design. They are 
integrated in our architecture model as pre-designed blocks. Thus, to use a memory 
block only its access scheme and access timing are required. A memory controller 
with an address decoder must be added to adapt the processor bus or network bus to 
the memory (local and global memories). 

3.1.4. Communication Interfaces. Communication interfaces provide bridges 
between previous components and the communication network. For software 
processors, these interfaces can be generated according to both: the processor 
attributes (e.g. memory bus, interrupts and I/O ports) and communication network. 
For hardware blocks, communication interfaces are often built in or provided with. 



3.2. Communication Network 

Although a wide range of communication architecture exists, only few ones are 
suitable in embedded systems on chip design methods. At a high abstraction level, 
communications are established through abstract channels, however at the physical 
level these abstract channels should be transformed in physical wires. The more 
conunonly supported conununication network is the point-to-point network. The 
great advantage of this network that it already leads for an automatic architecture 
generation from system level specification. Such a network achieves great 
performances but has a high cost. A more sophisticated network that can also satisfy 
same conditions as the previous one is the hierarchical buses network. However, it is 
more difficult to be supported by system design methods. These two networks meet 
quite well the increasing complexity in embedded systems on chip. 



3.3. Organization scheme 

As figure 1 shows, the proposed architecture model has a multiprocessor 
organization scheme. The model is made of a set of components communicating 
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through a communication network. The control is distributed among the different 
components that constitute the architecture. Off course, the components and 
communication network specifications are those mentioned above. 



4. Architecture Generation Fiow Based on MFSAM 

In this section, we present the overall flow for architecture generation based on the 
MFSAM presented in section 3. As we stated before, the final aim of this work is to 
build an efficient multiprocessor SoC design tool able to generate a complex 
application specific architecture, and that is for a large class of applications. The 
modularity, flexibility and scalability of the proposed model lead for wide 
application domain, but require to fix a large number of parameters at the design of 
each application. Thus, in order to make practical the use of the proposed model as 
template for multiprocessor SoC design tool we chose to drive from it a set of 
architecture platforms, each of which target a set of applications (i.e. an application 
filed), see figure 2. This specialization of the architecture model should also assist 
the designer in the implementation choices of the application. 




Figure 2. Architecture platforms based on MFSAM 

In addition, as all of these architecture platforms are derived from the same 
architecture model, the same architecture generation flow and generation tool can 
support all of them. The architecture generation tool will have at its entry an 
architecture platform (depending on the application field) and the application 
specific parameters that will configure the platform. These application specific 
parameters may be the results of a system level synthesis tool (e.g. a hardware/ 
software codesign tool). The architecture generation flow is shown figure 3. 




Figure 3. MFSAM-based architecture generation flow 
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Further illustrations about the implementation and efficiency of this flow will be 
given in the next section through a demonstration example. 



5. Demonstration Example 

In this example we show the feasibility and prove the efficiency of the proposed 
architecture model. A multiprocessor architecture platform based on MFSAM is 
proposed. Then, several application-specific architectures are generated, and that is 
for two application examples. The validation of the generated architectures was done 
by cosimulation. The results analysis shows the effectiveness of the proposed 
architecture model. 



5.1. A multiprocessor architecture platform based on MFSAM 

As we already dispose of the development kits of the two software processors ARM7 
and MC68000, we considered an architecture platform based on these two 
processors. The architecture platform constitutes of the following components: two 
types of software processors (ARM7 and MC68000), local memories and 
communication interfaces. The communication network is a point-to-point network. 
The block diagram of this architecture platform is given figure 4. 
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Figure 4. A multiprocessor architecture platform 

The free parameters of this architecture platform are the number of software 
processors, the number of I/O channels for each software processor, the 
interconnections between them and the interconnections with external systems. 
These parameters show the scalability of the platform and therefor enable the design 
of application specific architectures of different scales. In fact, as we mentioned in 
section IV, the application specific architecture is generated thanks to an architecture 
generation tool. This tool is the subject of other work in our research group. 
Actually, communication interfaces are almost automatically generated as hardware 
blocs according to the processor attributes and to the application parameters 
(communication channels). In order to validate the generated architecture, we need a 
cycle accurate executable architecture that can run the application. To that end, we 
used a cosimulation approach [9]. In this approach software processors are replaced 
by cycle accurate ISSs (ISS + BFM). All other parts of the architecture are modeled 
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in VHDL RTL and executed by a VHDL simulator (e.g. VSS). The cosimulation tool 
ensures the interconnection and synchronization of the running simulators for 
coherent execution of the overall system. 



5.2. Application examples 

Many applications can be mapped on the architecture platform presented above. Off 
course performance and cost aspects must be taken into account. We have used this 
platform to implement two applications: a Packet Routing Switch and an IS-95 
CDMA mobile station. As mentioned in the section IV, to map an application to an 
architecture platform we have to fix the application specific parameters. Then, the 
architecture generation tool takes in charge the configuration of the platform and the 
generation of the application specific architecture. 

5.2.1. Packet Routing Switch. It constitutes a powerful solution for large frame 
or cell switching systems. The version we present here is a simplified one; it consists 
of two input controllers and two output controllers. Each of the controllers handle 
one communication channel, and the communication links between input and output 
controllers is configured by an external signal to be direct or switched. We chose two 
architectures to implement this switch, one with four processors (two ARM7 and two 
MC68000), and the other with only two processors (one ARM7 and one MC680(X)). 
The configuration of the architecture platform with these parameters led for the two 
architectures shown in figure 5. 
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Figure 5. Two implementations of the Packet Routing Switch 
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5.2.2. IS-95 CDMA mobile station. In an IS-95 CDMA cellular phone system, 
the mobile station contains a call processor, a QCELP vocoder, and two modems 
(receiver and transmitter). Many architectural solutions are conceivable, we chose a 
one with four processors. The chosen architectural solution is shown figure 6 (we did 
not implement the call processor). 




Figure 6. Implementation of IS-95 CDMA mobile station 

The difference between this architecture and the one with four processors 
presented in figure 5 is the I/O for each processor, the interconnection between 
processors, and the external I/O. 

As we stated in the V.l, these architectures have been validated by cosimulation 
at RT level. Figure 7 shows the cosimulation architecture of the 4-processors 
architecture implementing the Packet Routing Switch shown figure 5. 




Figure 7. Cosimulation architecture of the packet routing switch 
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5.3 Analysis of the results 

Many other applications of different scales can be mapped on this single architecture 
platform. For this example, the architecture generation was done manually and it 
took about 1 day to generate one architecture. However, when the architecture 
generation tool will be accomplished, this time will drop significantly and it will be 
reduced to the time to capture the application specific parameters, i.e. few minutes. 
This example illustrates the feasibility and the efficiency of our architecture model. 
With this model, multiprocessor architectures become much easier to handle. We 
illustrated how generation of application specific architecture can become sample 
and very quick. Note that the architecture model we propose in this chapter is far 
more generic than the architecture platform we presented in this example. This leads 
obviously for a huge application field. Other kinds of software processors (and DSP 
cores) can be integrated and used in the same manner. This shows the great 
flexibility and modularity of the proposed architecture model. 

The modularity of our architecture model appears in the organization scheme, 
which consists of separated modules communicating through a communication 
network. It separate the behavior from the inter sub-system communication. In 
addition, each module can be designed separately, an assembling scheme is provided 
to efficiently connect them and to enable the reuse of existing modules. This 
assembling scheme is quite structured and permits easily the reconfiguration of the 
architecture. Thus, technology choice can be done late in the design process which 
lead to a great flexibility. The scalability of our architecture model is also achieved 
thanks to the assembling scheme. It depends on the scalability of the chosen 
communication network. This scalability allows to adapt the proposed architecture 
model for applications of different complexity scale. For instance increasing the 
number of processors or communication buses. 

It is worth noting that we consider the software processor cores without their 
peripheral components. This fact make our approach more generic and optimized as 
various issues emerge when considering these peripheral components (software 
targeting, communication interfaces... etc). 

Performance aspects of the architectures generated according to the proposed 
architecture model are subject of our running work. 



6. Conclusions 

In this chapter, we presented a generic architecture model for multiprocessor 
embedded system-on-chip design The proposed model is modular, flexible and 
scalable. It permits an efficient generation of multiprocessor architectures for 
embedded systems on chip. This work forms a promised step towards the definition 
of an efficient multiprocessor SoC design environment applicable to a large 
application domain. The key point when defining such an environment is to define 
the architecture model that will fix the class of architectures handled by the system. 
The model allows for automatic architecture generation. This chapter focused on the 
definition of the architecture model. The feasibility and effectiveness of this 
architecture model was illustrated by a significant demonstration example. 
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This paper presents an approach how to make microcontrollers 
able to execute Java applications with very small resource 
consumption compared to existing Java execution environments. 
The approach is based on the exploitation of the distributed 
computing power available in distributed controller network. 



1. Introduction 

About 98 % of the over eight billions processors produced in year 2000 will be used 
in the embedded systems market [H]. From these about 57 % will be 8-bit 
processors. Many of these microcontrollers will be interconnected using a 
networking technology that has little in common with the Internet. Rather special 
purpose technologies such as CAN, FireWire or BlueTooth are used to establish a 
controller network. Interesting examples for such networks, or distributed systems, 
are today’s cars. As cars are mass production goods, each cent counts and there is 
always a pressure to use the cheapest technology possible. 
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Next generation cars will be connected to the Internet via gateways to receive 
information from and also provide information to the outside. Car users will be able 
to run different applications on the computing infrastructure (e.g. computer assisted 
navigation). These applications have much in common with normal desktop 
applications, i.e. the computational power needed is much higher than available in 
today’s controller networks. This power will be provided by embedded processors 
which have the power of desktop processors. These supplementary processors have 
to cooperate with the controller network to access information about the car and to 
control its behaviour. The high processing power of the application processors 
allows the use of modern software techniques and languages. 

The Java language and its supporting technologies like RMI are very interesting 
in this context as Java allows writing of portable code, which runs on different 
systems in the same way. So different cars can have different hardware, as long as it 
is able to run Java. For instance a car navigation application will run in all cars. 
Ideally all processors in the car should be able to run Java applications. 

Unfortunately the microcontrollers currently used in the controller networks are 
not suited to run “normal” Java applications due to resource limitations. The 
controllers are not equiped with enough RAM and ROM to run a standard Java 
execution environment. Upgrading a node to make it ready for Java, is in many cases 
not feasible due to limited budget, electrical power consumption or heat dissipation. 

In this paper we discuss different possible ways to make embedded nodes Java 
ready. We then present our solution for this problem which based on distributed 
computing principles. Based on measurements with our prototypical implementation 
we discuss the feasibility of the approach. The last section contains concluding 
remarks and presents some plans for future developments in the project. 



2. The Network is the Computer 

An escape from the above described dilemma is resource sharing in a distributed 
environment. To use Java with all its power even on small nodes with limited 
resources, different ways are possible: 

Distributed applications. The application itself is designed distributed and some 
parts are remotely executed. This approach requires a full JVM with RMI (or similar 
mechanisms) support. The engineers have (even for simple) applications to worry 
about distribution or packages which hide the distribution from the programmer. 
Examples are BORG from Microsoft [3] which provides the view of a single JVM to 
applications running on different nodes. The cJVM is a similar solution for a CPU 
cluster from IBM [2]. 

Distributed libraries. The applications computation is done on the local node, but 
the libraries which form the API are implemented distributed. Because applications 
use only the standardized API, the distribution is transparent to them. But this 
requires reimplementation of the JDK API. The minimum requirements are a JVM 
and some means of communication system accessible from Java. 
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Distributed JVM. The JVM itself implements its services in a distributed manner. 
This allows the use of unmodified Java libraries and applications. Resource intensive 
parts of the JVM are located on different nodes. For example the class loader and 
byte code verifier could be a candidate for remote execution. The Sun kvm 
environment [4] e.g. uses a remote preverfier and packager for byte code 
preparation. The Jbed [4] Java enviroment can use a bytecode-to-native compiler 
which can be either run on the target node (for dynamic byte code compilation) or on 
a host computer (for precompilation into machine code) 

Distributed JVM runtime support. The JVM requires a number of services to be 
provided by the base system. For instance the class loader requires access to files 
containing class implementations. Many of these services can be implemented by 
some kind of remote invocation of service on other machines. 

Scalable JVM. The JVM itself is scalable. It provides only the functionalities 
required by the actual applications. E.g. if no object is ever destroyed no garbage 
collection support is necessary. Suns kvm is a good example for this approach. But 
the kvm approach doesn’t work automatically, the user must decide which 
configuration is required. Another idea is to omit a real JVM altogether and use a 
Java-to-Native compiler. The compiler generates machine code from the Java 
application and the application linker puts together the required parts of the runtime 
system. An example for this approach is GCJ [10 1. 

I I U/^hared memory | | Shared Libs Operating System m plication 




Figure 1 : Memory usage of HelloWorld 

These approaches are not mutually exclusive but can and should be used combined 
in order to achieve minimal resource usage on the node. Especially the approach 
which require a full JVM on a node are not stand-alone usable in most embedded 
contexts. Our measurements^ depicted in figure 1 for a simple “HelloWorld” showed 
an enourmous amount of memory required by standard Java environments. The two 
JVM implementations required around 8 MBytes of memory. Even the kvm^ 
configuration used more than a one Mbyte although Sun claimed in its whitepaper 



^ Figures were taken on a x86-linux system with IBM JDK 1.3.0, Sun JDK 1.3.0, Suns KVM from CLDC 
J2MEandGCJ 2.95.2 

^ See the kvm whitepaper on [7] for a more detailed disscussion 
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configurations with only 128k memory space are possible. The smallest amount of 
memory was used by the machine code executable generated with GCJ. But typical 
controllers have RAM below the 4k margin and ROM between 4k up to 256k. 

Especially the approach to realize parts of the Java runtime support remotely can 
be very effective. For example Java makes use of TCP/IP protocols for 
communication purposes. Instead of using a local TCP/IP stack implementation on 
each node a remote stack could be used. Although it might look strange to use 
remote access to a communication stack, it can pay off. The communication between 
the node and the remote stack can be tailored to the network architecture of the 
controller network. The assumptions of a TCP/IP stack about networks are different 
from such a network, especially regarding reliability, communication latencies etc. 
So a much smaller implementation of communication layers are possible on 
embedded nodes compared to the typical size of a TCP/IP stack of 50k to 100k. 



3. Tailoring Java 




Figure 2: Sample embedded car network 

One important constraint for our solution is the ability to support the complete Java 
language. Therefor the full JVM functionality must be available even on small 
nodes, not a restricted one like the JavaCardVM [6]. But closer examination shows 
that most applications in the embedded area do not make full use of the JVM but 
require only certain subset of functions. For instance many applications do not 
require dynamic class loading. Other applications do not need garbage collection. 
Some applications may not need any of these services. So rather than implementing 
only one solution that fits all we propose to use a family of JVM implementations. 
Depending on the required functionality and available resources on the target node, 
different configurations can be used. 
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Figure 2 depicts a possible configuration. The rightmost system is the most 
powerful node and uses a normal JVM. It implements a gateway and acts as a host 
node that provides remote access to services. The two other nodes communicate via 
CAN bus protocols with that node and can access the Internet using the host node’s 
TCP/IP stack. In general, those services which would not fit together with the 
application on a node need to be provided via a remote host node. In the shown setup 
the left node uses a JVM which executes bytecode and the middle node executes a 
Java application which has been compiled into native machine code. 

3 . 1 Overview 

Our solution for building an embeddable Java execution environment is based on the 
combination of open source tools and our embedded operating system family Pure 
[1]. The Java environment is provided by the GCJ Java-to-Native compiler with its 
runtime support library libGCJ. For the (optional) execution byte code the 
KaffeVM [5] can be embedded. The basic blocks of the our execution environment 
are shown in figure 3. 




Figure 3: Runtime system building blocks 



3.2 Java Runtime Support Layer 
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The libGCJ provides the runtime support for the machine code generated by the 
GCJ. It is a C++ library which provides the functionality of the JDK 1.1.8 from Sun 
with some JDK 1.2 extensions'*. 

The original libGCJ is not suitable for embedded targets. Although it required 
the lowest amount of memory for our HelloWorld example, it is far from being able 
to run in deeply embedded platforms. The main problem is the close coupling of the 
different library functionalities. Even if a specific function is not used, it is included 
in the executable. For instance, standard stream objects are always intialized. This 
causes all scalar data type functions to be included (conversion functions etc.). 

We modified the libGCJ to be more modular. Different methods were used for 
achieving this. The first step was to introduce conditional compilation statements 
into the code which allows for static, compile time decisions about functionalities 
provided by the library. By doing this we can do coarse grain configuration. We 
applied this technique for instance to the garbage collector or the floating point 
support. But this method is not applicable in every case. Using conditional 
compilation statements everywhere for configuration purpose can make the source 
code difficult to read and maintain. This is especially relevant for configurable 
properties which are not implemented in a single place but have their code 
distributed among many different parts of the system. 

For further optimization of the existing libGCJ we use a combination of tools 
which analyze and modify binary object code stored in the library. This allows us to 
exchange a function in a library with a different implementation without changing 
the source. We use this to replace for instance the unwanted initalization functions 
for stream objects with a void function. 

Further modifications which allow fine grained apdaptation of the libGCJ 
functionalities require replacement of parts of the library with new implementations 
of them. The idea is to replace the general-purpose one-size-fits-all implementation 
not by a single new implementation but rather use a family based approach [8] with a 
set of implementations. 

3.3 Posix Emulation Layer 

The libGCJ library requires a runtime system which provides a set of POSIX 
compatible system calls. The Pure operating system family doesn't have POSIX as 
its native API. To solve this problem we introduced new Pure family members. The 
members are able to translate the POSIX calls into the object-oriented world of Pure. 
The other, very important function of these members is the ability to redirect POSIX 
calls to other nodes. This is a realization of the distributed runtime approach 
described in section 2. 

Closer examination of the functionalities required by libGCJ reveals that only a 
very limited number of them need to be provided locally. These are shown in the 
lower left box of figure 3. All other functions are subject to remote execution. They 
could either be handled locally if resources are available (e.g. access to a disk) or be 
forwarded using the system-call proxy. 



4 

Functionality as at Autumn 2000 
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The POSIX call layer consists of a family of different implementations to execute 
such a call. One family extension supports local execution of a limited set of POSIX 
calls. A second family extensions supports the remote execution of POSIX calls on a 
different node. A third extensions allows for the switch between local and remote 
execution depending on the call arguments. 

The implementation of the remote call execution uses are very simple remote 
procedure call (RPC) protocol. It supports only three data types: Integer, Character 
and Byte Array. If POSIX calls require structured data as arguments, then it is 
translated into a byte array on the client and the server has to known which data 
format the client uses. But in most case no structured data is needed. If the 
functionality provided by this implementation does not meet the requirements of the 
application, new family extensions may provide these additional functionality. 



3.4 Operating System Layer 

The lowest layer of JPure is provided by Pure. Pure is an operating system family for 
deeply embedded systems developed in our group. It is implemented in C++ and 
runs on many different processor types ranging from 8 bit (Atmel AVR) to 64 bit 
(Alpha). The family based design allows a maximal adaption of the operating system 
to the needs of the application(s) without unnecessary resource consumption. The 
result is high execution speed paired with low memory footprint. 



4. Times and Sizes 

To make our measurements of JPure comparable we had to choose a platform for 
which other Java implementations are easily available. A Linux system with a 
Pentium 166 CPU was therefor our target systems although it is not a real embedded 
target. 



Function 


Size 


Data part 


Percentage 


libGCJ + App. 


294k 


94k 


94% 


Pure Core 


6k 


0.5k 


2% 


Pure Serial Driver 


4k 


Ok 


1% 


Pure lOLib 


5k 


0.5k 


2% 


Remote Posix Calls 


2k 


0.5k 


1% 



Figure 4: Memory usage of HelloWorld on JPure 

From the figures given in Table 4 it is easy to understand where the problems are. 
The runtime support for the libGCJ based on JPure takes only 6 percent of the used 
memory, the rest is dedicated to the libGCJ itself. The overall memory requirement 
looks not too bad compared to the numbers shown in figure 1, but unfortunately a 
third of it is valuable RAM. Most of the RAM is used by global libGCJ objects. 
Further reduction of this number therefore requires removal of unused and 
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unnecessary global objects. Ways to achieve this were discussed in section The 
server on the remote nodes needs about 388k on a Linux system (statically linked) 



Java Environment 
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GCJ/Linux 2.96 
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GCJ/Linux 2.95 




mil 
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2167 




2530 


1712 


1825 


Sun JVM 1.2 


1332 




1178 



Figure 5: Selected Results for EmbeddedCaffelneMark Runs 

The benchmark used to measure the Java performance was the 
EmbeddedCaffelneMark from Pendragon [9]. The results are quite interesting. The 
IBM JVM was the clear winner with a performance which is several times higher 
then any other Java version. From the individual test score for the loop test of the 
benchmark shown in table 5 it is obvious the IBM JVM includes a JIT which has 
special optimizations for this kind of benchmark. The second best result is the 
GCJ/Linux 2.96. As Jpure is currently based on GCJ version 2.95 we expected after 
a a port of our modifications to GCJ 2.96 similar results from JPure. For Version 
2.95 JPure and GCJ/Linux have similar scores for all but one test. At the present time 
we are not able to explain the huge difference in the loop score. Both platforms used 
the same object file containing the loop test. We will do further investigations and 
hope to be able to solve or at least explain the result. 



5. Current State and Future Development 

The architecture sketched above is currently being implemented as part of a project 
together with a car manufacturer. The test case is a car which provides many 
telematic and multimedia services to the car users. The current setup consists of 5 PC 
which shall be replaced by smaller configurations based on PPC8xx and C16x. The 
host nodes will be running Linux (later WindowsNT will be possible too), while the 
other nodes are running JPure. 

A prototypical implementation with RS232 based messaging running on x86 PC 
is complete. The JPure system itself is already running on other platforms but the 
communication drivers for CAN and ethernet are not yet fully functional so only 
selfcontained operation of JPure is possible on these platform right now. 

Future extensions will include the partial distributed realization of standard Java 
libraries such as the Abstract Windowing Toolkit. An important extension will be the 
dynamic loading of class as native code into a running JPure system. The Java-to- 
Native compilation could be done on the JPure machine or if not enough resources 
are available on the local node, the compilation is done on a remote node and only 
the loader is local. 

The emerging Java processors will be a very interesting target for our approach. 
Although these processors are able to execute bytecode natively with very high 
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performance, they still need Java runtime software. If part of this software are moved 
to a remote node, very cheap and fast Java nodes with only small amounts of RAM 
are possible. This could make the use of Java feasible even in areas where today 
highly optimized assembly and C code are used. 
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The purpose of this paper is to show that functionality distribution 
among embedded objects of a system architecture critically 
influences the quality of the resulting architecture. It is shown that 
gains can be achieved when functionality distribution is guided by 
optimization criteria, using an automatically generated centralized 
processing architecture. This architecture is modified in order to 
carry out distributed processing. Two alternatives are then used: 
first, functionality associated with data processing is removed from 
the processing class, to different external classes, manually. Second, 
the entire architecture is re-generated to comply with distributed 
processing. Object Oriented Paradigm and UML, in conjunction with 
an extension of data flow diagram (E-DFD) conveying timing 
information are used for system requirement formalization. SIMOO- 
RT is used as graphical interface for direct Use-Case and E-DFD 
diagram constructions and for simulation. SysObj is used for 
architecture generation and quality assessment. Resident Quality 
Metrics and Criteria are used. Metric values obtained for the 
different solutions prove that it is rewarding to carry out 
functionality distribution optimization. 



1. Introduction 

Electronic hardware/software (hw/sw) systems design is an increasingly difficult task 
due to high complex functionality and stringent quality requirements [1]. Additional 
difficulties result from the heterogeneous nature of most common solutions, based on 
the use of complex hw/sw modules and extensive reuse of pre-defmed Intellectual 
Property (IP) cores [2,3] provided by different suppliers, eventually using different 
specification languages. Moreover, functional and non-functional requirements (such 
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as testability or dependability) must be taken into consideration [4,5]. In fact, 
implicit requirements that are not specifically related with desired functionality, 
influence the final implementation [5] and thus must be considered. 

In general, several solutions, reflecting different design tradeoffs and 
corresponding to different architectures, satisfy user’s requirements. A tradeoff 
implies that characteristics are compared and some are valued against others. 
However, not all designers equally value the same characteristics. Therefore, 
consensus on the 'best-fitted' architecture is difficult to reach. Quality Metrics (QMs) 
need to be defined, and decision criteria to support architecture selection have to be 
devised. 

In fact, an adequate choice of system architecture is a key factor of success in 
product development. The authors have recently proposed a methodology for system 
architecture generation, selection and reconfiguration, together with a set of design- 
oriented and test-oriented QMs and decision criteria [5-8]. The proposed 
methodology leads to architectural solutions for which cohesive, autonomous, 
loosely coupled and balanced objects (from the complexity and/or performance 
points of view) result. The characteristic of having objects with similar execution 
times can enhance system performance and increase the possibilities of parallel 
operation. 

Another key issue in system design is the decision on either to concentrate or 
distribute intelligence, and in what measure [9]. In fact, not only economic reasons 
force the reuse of low-cost, existing components, but also embedded or external 
functionality leads to architectures more or less complex, with good or bad 
performance, and hard or easy to test. If an experienced design team develops an 
architecture and wants to evaluate the trade-off of moving part of the functionality to 
the system boundaries (i.e., to the dialoguing actors), how to evaluate it? How does 
the quality of the architecture is deteriorated (or not) by it? These are the questions 
addressed in this paper, which is organized as follows. In the section 2, the 
architectural design approach is described. In section 3, the methodology supporting 
tools are presented and their main characteristics highlighted. In section 4, a case 
study is described and the architecture alternatives (for distributed versus 
concentrated intelligence), obtained according to two different strategies, are 
presented and compared. Finally, section 5 draws the main conclusions. 



2. Design Methodology 

As referred, a key issue in the design process is the choice of an adequate 
architecture. The architecture should lead to a low-cost solution, with high 
performance of the entire functionality, while satisfying quality and other non- 
functional requirements. 

In most cases, experienced designers applying previously working solutions 
define system architecture. For sw design using the Object-Oriented (00) paradigm 
[10-12], the design team makes manually the empiric identification of all system 
classes and objects, according to a given textual description. At system level, the 
architecture should not incorporate a large number of top-level objects, to manage 
complexity [13]. The functionality (object methods, in 00 semantics) assigned to 
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each object should be such that balanced object complexity (and execution time) 
results, along with loose object associations; this latter feature corresponds to known 
sw quality metrics, such as object cohesion, coupling and autonomy [14,15]. 
Although the valued characteristics of a “good” architecture are those above 
mentioned, there is a limited effort to quantify the architecture quality. Unless 
obvious drawbacks in the proposed architecture become evident, the architecture is 
accepted, and the design flow continues. Similar considerations can be made in hw 
design, for which a strong reuse of pre- verified hw modules, or IP cores, emphasizes 
the trend of a priori object (module) definition [3]. The monolithic integration of 
embedded cores in SOCs (Systems on a Chip) severely restricts object accessibility, 
enhancing the need for an efficient use of DFT (Design for Testability) techniques. 
However, these usually emerge late in the design flow, when object behavior is 
mapped into a structural hw netlist of logic elements. 

The authors recently proposed a methodology that supports system designers on 
the automatic architecture generation and formal quality assessment of derived 
architectures [5-9]. The underlying principle is to provide an automatic procedure to 
assign functionality to identifiable objects, leading to objects with the characteristics 
outlined above. In Fig. 1, the different phases of the methodology are depicted. 




Fig. 1 - Steps of the proposed methodology 

At specification level, Object-Oriented (00) modeling techniques [10-12], 
together with the Unified Modeling Language [16] and an extension of Data Flow 
Diagrams (E-DFD) are used. Following UML, actors are identified (together with 
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their attributes), as well as their interaction with the system under development. Use- 
case diagrams are used to model actors and system interaction, as well as the 
system’s main functionality. Design economics usually makes mandatory the reuse 
of pre-defmed macros, or cores (considered as internal, non-reconfigurable objects). 
If functional implementation is to be carried out in hw, such IP cores can be legacy 
cores, or provided by different IC vendors [2]. These are modeled as internal actors. 

Nevertheless, there is always functionality that cannot be assigned to a 
semantically meaningful class (object). In this case, several classes can be generated 
to encompass this functionality. In our methodology, we propose that class definition 
be carried out based on formal constraints, provided by formally defined QMs and 
clustering criteria. By doing so, class generation becomes almost deterministic and 
corresponds to an optimum distribution of functionality according to a given set of 
parameters. 

In order to obtain this optimum functional distribution a graph representation of 
functionality is used. In this TOG (Task-Oriented Graph), nodes are tasks and edges 
are associations between tasks. The graph is the formal mapping of an extended 
DFD, for which (1) processes represent the unassigned functionality and (2) data or 
control represent the corresponding attributes, conditioned by timing constraints (if 
this is required). We assume atomic processes, and define a task as the set of an 
atomic process and the single attribute it updates. Graph partitioning, according to 
different strategies and algorithms, will lead to different possible task clustering, 
enabling an optimal functionality distribution among classes. Each cluster of tasks is 
defined as an object. Design and test oriented metrics and decision criteria are used, 
a posteriori, to select the most adequate architecture. At present, graph partitioning 
strategies value object balancing (in terms of complexity, or performance) and min- 
cut, in order to minimize object association and dependence. 

The most adequate architecture is the one that better meets some desirable 
properties. This is evaluated by the values obtained for pre-defmed quality metrics 
and criteria.The possibility of automatic architecture generation and quality 
assessment (by means of QM computation), guided by optimization criteria, open 
ways not only to assess the quality of an architectural solution proposed by a design 
team, but also to compare it with automatically generated solutions. 

However, an additional usefulness of the proposed methodology emerges: the 
optimization of functional distribution (or assignment) to objects, when tradeoff 
analysis must be carried out, e.g., between distributed or centralized processing. In 
fact, in many industrial applications, e.g., in real-time automation systems, practical 
solutions can favor either distributed or centralized processing. In the case study 
presented in section 4, the question is: should we use either smart (intelligent) 
sensors, and a network of less complex control systems, associated with each 
machine in a wine bottling production line, or local (dumb) sensors, and a network 
or more complex control systems? 

In such case, if a ‘good’ architecture solution for centralized processing has been 
reached, the usual procedure is to manually move part of the processing from the 
system to the external actors. Conversely, if a ‘good’ solution has been derived for 
distributed processing, one can move part of the functionality into the system 
boundaries. However, are we moving from one optimized solution to another one? 
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The claim we make in our work is that it ain’t necessarily so, and often a better 
architectural solution can be derived. The proof is presented in section 4. 



3. Tool Support 

The tool environment that supports the methodology integrates SIMOO-RT 
modeling tool [17,18] and SysObj [6]. A brief description of the main features of 
each individual tool is presented in next subsections. 

3. 1 SIMOO-RT Modeling Tool 

SIMOO [17] is a framework for development of discrete object-oriented simulation 
models. Among the objectives proposed for this environment, there is the possibility 
to specify simulation models using rules that allows its use in the implementation of 
the control system for real entities. The representation of simulation entities by 
means of autonomous elements, based in the idea of active objects, suggests not only 
the definition of a distributed application but the construction of reusable entities 
library. 

The SIMOO tool offers to the users a modeling tool called MET {Model Editing 
Tool), presented in fig. 2. The modeling approach adopted is hierarchical, allowing 
the definition of several levels of detail, since the first class of the model needs to be 
refined into lower-level classes, defining an aggregation relationship. Besides that, 
the user must specify an instance diagram, which will guide in the generation of the 
executable simulation model. 




Fig. 2: Example of SIMOO-RT model. 

SIMOO-RT extends original SIMOO [18], incorporating on it specific features 
that allow definition of temporal requirements, like deadlines and periodic 
operations. This environment encourages the use of state machines to describe the 
model behavior. SIMOO-RT also offers support to make automatic code generation 
for a real-time operating system, where the adopted target language is AO/C++ [19]. 
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3.2 SysObj 

SysObj is a CASHE (Computer Aided Software Hardware Engineering) tool that 
implements the aspects of the methodology related with the automatic object 
generation and QM computation. It accepts, as input, a file containing the DFD 
description of system requirements, as provided by the Paradigm Plus CASE tool, or 
modified in order to include testability features. Additionally, task and task 
association weights can be introduced, to reflect either complexity, or timing 
requirements. It provides, as output, a discrete set of architectural solutions, one per 
number of system objects, as well as the values of the different QMs that guide the 
design team in the selection of the final architecture. At present, SysObj is 
embedded in SIMOO-RT, in the MOSYS environment [7]. 



4. Case-study 

The system, in which the case study is embedded, is an industrial production line. 
The case study is based on the design of the real-time control system, responsible for 
the automation of the production line [20]. Here we will only address the aspects that 
are relevant for methodology assessment. The actual architectural solution for the 
centralized control system (referred as "archit.!") has been derived using an 00 
Modeling technique (fig. 3). In the complete development, system architecture has 
been derived using design-oriented metrics and criteria for deciding the best 
allocation of processes and attributes within objects. 

The analysis to be carried out aims at demonstrating that gains can be obtained 
when optimization algorithms are used for functionality distribution. For doing so, 
the system functionality and system context are viewed as one global system. In the 
actual implementation (archit. 1, Fig. 3), the control system makes use of a set of 
sensors distributed along the production line. Control is carried out based on the 
values provided by the sensors. 

The sensors used in the actual implementation are passive sensors. The objective 
of the study is to compare this implementation with alternative implementations 
assuming a distributed processing, based on the use of smart sensors. For that, 
intelligence must migrate from the central processor to the different sensors. 

Two strategies are used for assigning intelligence to the sensors. The traditional 
approach would be to migrate intelligence from the class that represents the sensor 
processor to the class that represents the sensor. We refer this as "archit.3". In the 
proposed approach, the original E-DFD is modified according to the new objective, 
i.e., to distribute intelligence within the actors. After that, the remaining functionality 
distribution is carried out by SysObj , using optimization algorithms, QM 
computation and decision criteria. A novel architectural solution ("archit.2”) is then 
automatically generated. 

The automatic generation of system architecture, as carried out with SysObj, is 
shown in Fig. 3 for archit. 1. In this approach, we have retained one of the customer 
constraints, that is, one processing unit for each machine, in the production line. Four 
main objects, responsible for all data processing and line control constitute the 
generated architecture. Sensors have been defined as actors. 
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Fig.3 - Automatically generated 00 diagram (archit.1) 

As referred, in the actual (centralized) implementation, there is no ‘intelligence’ 
associated with sensors. In archit.2, tasks within objects are automatically 
reassigned, so objects of similar complexity result. Again, the selected architectural 
solution corresponds to a 4-object architecture. As expected, when comparing 4- 
object archit.1 and 2, QM computation by SysObj shows that the system associated 
with the distributed architecture exhibits higher system coupling (from 58.7 to 
63.0%) and lower average object autonomy (from 1.04 to 0.83). Formal QM 
definition has been presented elsewhere [6]. However, these slight modifications in 
metrics values do not have a significant impact on architecture quality. In archit.3, 
tasks associated with sensor’s data processing have been manually moved to the 
corresponding objects. No other modification on the four main objects of archit.1 has 
been made. Now, it is to be expected that an unbalanced architecture may become 
visible by QM evaluation. 

Three QMs quantifying individual object characteristics are used for comparing 
the three resulting architectures, namely. Cohesion, Autonomy and Object Relative 
Weight (ORW). The first two metrics describe the structural aspects of the 
architecture (namely, the strengths of object associations), while the third metric 
values object comparisons, in terms of their weights, which quantify either 
complexity, or timing response. The results of the evaluation are represented in 
tables 1-3. In fig. 4, methods (or task) redistribution, in archit.2 and 3, is depicted. 
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Table 1 -Central processing architecture (archi.1) 



Archil 1 


Q. metrics I 










Object 1 


2.82 


1.1 


23.08 


Object2 


4.14 


1.08 


26.92 


Objects 


3.85 


1.25 


26.92 


Object 4 


2.4 


0.71 


23.08 


Average 


3.3 


1.04 


- 


Std Dev. 


0.71 


0.2 


1.92 



Table 2 - Automatically generated architecture for distributed processing (archit.2) 



Archil2 


Q. metrics I 






autonomy 




ISSSBii 


4.76 


1.4 


28.57 


Object2 


2.24 


0.5 


23.81 


Object3 


2.86 


0.55 


23.81 


Object 4 


3.06 


0.9 


23.81 


Average 


3.23 


0.84 


- 


Std. Dev. 


0.93 


0.36 


2.06 



Table 3 - Manually modified architecture for distributed processing (archit.3) 



Archil 3 


Q. metrics I 






autonomy 




Object 1 


3.49 


1.1 


28.57 


Object2 


1.43 




14.29 


Object3 


4.76 


1.25 


33.33 


Object 4 


3.3 


0.82 


23.81 


Average 


3.24 


0.85 


- 


Std. Dev. 


1.19 


0.38 


7.04 



As it can be seen through ORW values, archit.l (benchmark) and 2 exhibit 
balanced objects, as automatic graph partitioning has been carried out favoring (1) 
object balance and (2) graph min-cut. In contrast, archit.3 (the usual solution) creates 
especially one object (obj.2) clearly unbalanced, with low cohesion and autonomy 
(i.e., strongly dependent on the remaining objects) and lower ORW. This is reflected 
on the higher standard deviation values. Conversely, obj.3 is more cohesive and 
autonomous than the average (25%). 

As shown in Fig. 4, 5 tasks have been removed from the original system 
(archit.l), which is the centralized architecture. However, the optimized distributed 
architecture (archit.2) corresponds to a class (or object) definition for which 
methods (or task) assignment to the objets greatly differs from the one in archil 1. In 
fact, e.g., object 1 in archit.2 has assigned to it methods originally assigned to all the 
4 objects of archit.l. This example highlights the fact that manually moving 
functionality to the external actors can produce a poor architectural solution. As 
shown in Fig. 4, obj.2 in archit.3 has now only 3 remaining methods assigned to it, as 
compared to the original 7, which justifies the quantitative values depicted in table 3. 
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Fig.4 - Methods reassignment. In archit.2 and 3, p10, 1 1 , 19, 20 and 26 are moved 
out of the system. 



5. Conclusions 

In conclusion, this paper shows that functionality distribution among embedded 
objects of a system architecture is a critical issue for the quality of the resulting 
architecture. Significant gains can be obtained when functionality distribution uses 
optimization criteria, such as object balance (in terms of task complexity, or 
execution time) and autonomy. Centralized versus distributed architectures have 
been compared, allowing tradeoff analysis to be performed. In order to generate a 
distributed architecture, starting from a centralized one, two alternatives have been 
used. First, functionality is removed from the processing class, to external actors, 
manually. Second, the entire architecture is re-generated to comply with distributed 
processing. Automatic architecture generation uses OOriented Paradigm and UML, 
in conjunction with an extension of data flow diagram (E-DFD) conveying timing 
information have been used for system requirement formalization. Two proprietary 
tools have been used to implement the methodology. SIMOO-RT is used as 
graphical interface for direct Use-Case and E-DFD diagram constructions and for 
simulation. SysObj is used for architecture generation and quality assessment. 
Resident Quality Metrics and Criteria are used. Finally, it was shown that task 
assignment in the object definition of the optimized distributed architecture greatly 
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differs from the one in the manually modified architecture. This conslusion clearly 
justifies the usefulness of using a methodology for automatic generation of system 
architectures, valuing characteristics such as object balance and enhanced autonomy, 
which in turn favors performance, parallelization and testability. 
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Modern Embedded Systems-on-Chips (SOCs) will allow the 
system designer to customize Intellectual Property (IP) cores 
(fixed and programmable), together with custom logic and large 
amounts of embedded memories. As the software content in these 
emerging embedded SOCs begins to dominate the SOC design 
process, there is a critical need for support of an integrated 
software development environment (including compilers, 
simulators and debuggers). Furthermore, since many 
characteristics of these processor core IPs (e.g, instruction-sets, 
memory configurations) are increasingly customizable, the entire 
software toolkit chain needs to be customized and generated to 
support both early design space exploration (for performance, 
power and cost constraints), as well as high-quality software 
generation. This paper describes our Architecture Description 
Language (ADL) driven approach for customizing software 
toolkits. 



1. Introduction 

The advent of System-on-Chip (SOC) technology has resulted in a paradigm shift for 
the design process of embedded systems employing programmable processors with 
custom hardware. The design processes most impacted by this emerging technology 
include 1) System Design Space Exploration (DSE) and 2) Software design. 
Traditionally, embedded systems developers performed limited exploration of the 
design space using standard processor and memory architectures. Further, software 
development was usually done using existing, stock processors (with supported 
integrated software development environments) or done manually using processor 
specific low-level languages (assembly). This was feasible because the software 
content in such systems was low and also because the processor architecture was 
fairly simple (e.g. no Instruction Level Parallelism features) and well-defined (e.g. 
no parameterizable components). 
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The dotted box in Figure 1. shows a contemporary hardware/software co-design 
methodology for the design of traditional embedded systems consisting of 
programmable processors, application specific integrated circuits (ASICs), 
memories, I/O interfaces, etc. This contemporary design flow starts from specifying 
an application in a system design language. The application is then partitioned into 
tasks that are either assigned to software (i.e., executed on the processor) or 
hardware (ASIC) such that design constraints (e.g., performance, power 
consumption, cost, etc.) are satisfied. After hardware/software partitioning, tasks 
assigned to software are translated into programs (either in high-level languages such 
as C/C++ or in assembly), and then compiled into object code (which resides in 
memory). Tasks assigned to hardware are translated into HDL descriptions and then 
synthesized into ASICs. In traditional co-design systems, the target architecture 
template is pre-defmed. Specifically, the processor is fixed or can be selected from a 
library of pre-designed processors, but customization of the processor architecture is 
not allowed. Even in co-design systems allowing customization of the processor, the 
fundamental architecture can rarely be changed. 




Figure 1 . ADL based co-design flow for SOCs 

In the SOC domain, system-level design libraries increasingly consist of 
Intellectual Property (IP) blocks such as processor cores that span a spectrum of 
architectural styles, ranging from traditional DSPs and super-scalar RISC, to VLIWs 
and hybrid ASIPs. These processor cores typically allow customization through 
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parameterization of features (such as number of functional units, operation latencies 
etc.). Furthermore, SOC technologies permit the incorporation of novel on-chip 
memory organizations (including the use of on-chip DRAM, frame buffers, 
streaming buffers, and partitioned register files). Together, these features allow 
exploration of a wide range of processor-memory organizations in order to customize 
the design for a specific embedded application. 

The contemporary co-design flow (which does not permit processor-memory 
customization) limits the ability of the system designer to fully utilize emerging IP 
libraries and restricts the exploration of alternative (often superior) SOC 
architectures. Consequently there is tremendous interest in a language-based design 
methodology for embedded SOC optimization and exploration; Architectural 
Description Languages (ADLs) are used to drive DSE and automatic 
compiler/simulator toolkit generation. 

As with an HDL-based ASIC design flow, several benefits accrue from a 
language-based design methodology for embedded SOC design, including the ability 
to perform (formal) verification and consistency checking, to modify easily the target 
architecture and memory organization for DSE, and to automatically generate the 
software toolkit from a single specification. Figure 1. illustrates the ADL-based SOC 
co-design flow, wherein the architecture template of the SOC (possibly using BP 
blocks)is specified in an ADL. This template is then verified or validated using 
formats methods. After verification, the software toolkit is automatically generated 
to be used for software compilation and co-simulation of the hardware and software. 

Another important and noticeable trend in the embedded SOC domain is the 
increasing migration of system functionality from hardware to software, resulting in 
a high degree of software content for newer SOC designs. This trend, combined with 
shrinking time-to-market cycles, has resulted in intense pressure to migrate the 
software development to a high-level language (such as C, C++, Java) based 
environment in order to reduce time spent in system design. 

To effectively explore the processor-memory design space and develop software 
in a high-level language, the designer requires a high quality software toolkit 
(primarily a highly optimizing compiler and cycle-accurate simulator). Compilers for 
embedded systems have been the focus of several research efforts [14] recently. A 
promising approach to automatic compiler generation is the "retargetable compiler” 
approach. A compiler is classified as retargetable if it can be adapted to generate 
code for different target processors with significant reuse of the compiler source 
code. Retargetability is typically achieved by providing target machine information 
(in an ADL) as input to the compiler along with the program corresponding to the 
application. 

The compilation process can be broadly broken into two steps: analysis and 
synthesis[l]. During analysis, the program (in HLL) is converted into an 
intermediate representation (IR) that contains all the desired information such as 
control and data dependences. During synthesis, the IR is transformed and optimized 
in order to generate efficient target specific code. The synthesis step is more complex 
and typically includes the following phases: Instruction Selection, Scheduling, 
Resource Allocation, Code Optimizations/Transformations, and Code Generation 
[15]. The effectiveness of each phase depends on the target architecture and the 
application. A further problem during the synthesis step is that the optimal ordering 
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between these phases (and other optimizations) is highly dependent on the target 
architecture and the application program. (For example, the ordering between the 
memory assignment optimizations and instruction scheduling passes is very critical 
for memory-intensive applications.) As a result, traditionally, compilers have been 
painstakingly hand-tuned to a particular architecture (or architecture class) and 
application domain(s). However, stringent time-to-market constraints for SOC 
designs no longer make it feasible to manually generate compilers tuned to particular 
architectures. 

The ADL-based retargetable compiler approach allows fast generation of a target 
specific compiler from a high-level description of the processor architecture. 
However, in order to be able to generate high-quality code, there is a need for 
techniques that allow customization of the compiler based on the application 
(specified in a HLL) and the architecture (specified in an ADL). In this paper we 
present Transmutations, a compiler customization framework that integrates the 
various embedded system compilation techniques, and allows for dynamic ordering 
between the compiler phases. The Transmutations framework is a part of the 
EXPRESS retargetable compiler for embedded systems. The EXPRESSION ADL is 
used to specify the processor-memory subsystem and achieve retargetability of 
EXPRESS. In Section 2 we present related work in the areas of compiler 
retargetability and customization. In Section 3 we present our EXPRESS compiler 
framework, briefly describing EXPRESSION, EXPRESS and Transmutations. In 
Section 4 we present some preliminary results motivating the need for compiler 
customization. Finally, we conclude with a summary in Section 5. 

2. Related Work 

The problem of generating efficient software toolkits for embedded systems has been 
the focus of several recent research activities. [9] contains a survey of some of the 
recent efforts in automatically generating the software toolkit from a specification of 
the system in an Architecture Description Language (ADL). In this paper we focus 
on the problem of increasing the efficiency of the compiler by customizing it for the 
given application and architecture (and also the design goals). 

Previous research in the area of compilers for embedded systems has resulted in 
the development of various techniques to increase the efficiency of the compiler for 
performance, code-size and power goals. Also, there have been some projects that 
aim to incorporate these individual techniques in a complete compiler flow for a 
(narrow) range of processors. The CodeSyn [13] project demonstrated a compiler for 
a limited embedded processor class with irregular architectures. CHESS [12] is a 
retargetable code generation environment for fixed-point DSP processors. CHESS 
uses the nML [4] ADL to achieve retargetability. The AVIV [10] compiler, using the 
ISDL [7] ADL, produces machine code optimized for size. The MSSQ and 
RECORD compilers use the MIMOLA [2] ADL to achieve retargetability. MMSQ 
is able to produce microcode for a large range of datapath architectures, but suffers 
from low code quality. The RECORD compiler, however, targets mainly DSP 
architectures. 
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The quality of generated code is heavily influenced by the ordering of the 
compiler optimizations (also known as phases). Most compilers rely on a 
predetermined ordering of the phases. However, as these phases are mutually 
dependent and may adversely affect each other, this approach is sub-optimal when 
retargeting the compiler for a wide range of processors and applications. 
Simultaneous execution of all the phases in order to avoid restricting the solution 
space is not practical because of the large number of optimizations. An example of 
this is the Integer Linear Programming based approach proposed in [22], which 
suffers from extremely high runtime requirements. 

In recent years, techniques that integrate some optimizations in order to mitigate 
the phase ordering problem have been reported. For example, instruction scheduling 
and register allocation have been integrated in [17], [3]. However, most such 
techniques have only considered RISC like architectures with homogeneous register 
files. The AVIV compiler attempts to solve the phase ordering problem by 
performing a heuristic branch-and-bound step that executes resource 
allocation/assignment, operation grouping, and scheduling concurrently. The CHESS 
compiler uses data routing as a technique to simultaneously solve the problems of 
code selection and register allocation. Mutation Scheduling (MS) [18] integrates 
code selection and register allocation into instruction scheduling by "adapting” the 
computation of values to conform to varying resource constraints and availability. As 
the problems are NP-hard, MS depends on heuristic guidance to limit the search 
space. However, MS only integrated the traditional compiler phases and mainly 
considered homogeneous architectures. Transmutations incorporates MS and further, 
as explained in Section 3.3, provides for changing the ordering of other embedded 
systems optimizations (such as memory optimizations, SIMD, etc.). 

3. EXPRESS Retargetable Compiler 

In order to effectively compile for modem embedded processor architectures, the 
compiler needs to incorporate a large set of optimizations. These optimizations may 
target different aspects of the architecture (e.g. conditional instmctions) or the 
application (e.g. SIMD). The EXPRESS retargetable compiler adopts a "toolbox” 
approach to incorporating both traditional and embedded systems specific compiler 
optimizations. However, in such an approach, the phase ordering between the 
optimizations has a huge impact on the quality of generated code. The problem of 
determining the ’optimal’ phase ordering is further complicated by the fact that most 
applications have regions with different characteristics (e.g. loop regions, if-block 
regions, etc) which require different optimization orderings. Statically determined 
phase orderings may not be able to satisfy the stringent constraints of performance, 
power, code size, etc. The compiler requires the ability to dynamically determine, 
based on the region(s) of interest, the best ordering of optimizations. The EXPRESS 
compiler incorporates Transmutations, an approach that attempts to provide for 
dynamic ordering of the phases based on the program characteristics and available 
resources. The detailed architectural information need by EXPRESS to efficiently 
retarget itself is derived from a high-level description of the processor-memory 
subsystem in EXPRESSION. In the following, we first present a brief overview of 
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the EXPRESSION ADL, the EXPRESS compiler and then describe the 
Transmutations framework in detail. 



3.1 EXPRESSION ADL 

We use EXPRESSION, an ADL designed to support design space exploration (DSE) 
of a wide class of processor architectures ranging from RISC, DSP, ASIP, and 
VLIW, coupled with a variety of memory system organization and hierarchies [8], in 
order to retarget EXPRESS, an optimizing memory-aware, ILP compiler. 
EXPRESSION contains an integrated specification of both structure and behavior of 
the processor-memory system. The structure is a net-list of components (i.e., units, 
storages, ports, and connections), as well as the valid unit-to-storage or storage-to- 
unit data transfers. The pipeline architecture is described as the ordering of units 
which comprise the pipeline stages, plus the timing of the multi-cycled units. The 
behavior describes the IS in a hierarchical manner. Each operation is defined in 
terms of its opcode, operands, and format. Each instruction is viewed as a list of slots 
to be filled with operations. In EXPRESSION, resource conflicts between 
instructions are not explicitly described, but reservation tables (RTs) specifying the 
conflicts are automatically generated and passed to the ILP compiler [6]. Since 
manual description of RTs for deeply pipelined or VLIW processors is cumbersome 
and error-prone, this ability to automatically generate RTs facilitates rapid DSE of 
SOC architectures. Another key feature of EXPRESSION is the ability to specify 
novel memory systems including memory hierarchies, on-chip DRAM, frame 
buffers, partitioned memory address spaces, etc. EXPRESSION is also able to 
automatically generate the timing behavior of each operation based on the timing 
behavior of architectural components [5]. The architecture specific resource 
constraint and timing information is then used by the EXPRESS compiler to retarget 
its optimizations and produce code tailored for that particular architecture. 



3.2 EXPRESS 

EXPRESS is an optimizing, memory-aware. Instruction Level Parallelizing (ILP) 
compiler. EXPRESS uses the EXPRESSION ADL [8] to retarget itself to a wide 
class of processor architectures and memory systems. Figure 2. shows the EXPRESS 
compiler along with the Transmutations framework. The inputs to EXPRESS are the 
application specified in C, and the processor architecture specified in EXPRESSION. 
The front-end is GCC based and performs some of conventional optimizations. The 
core transformations in EXPRESS include RDLP [19] - a loop pipelining technique, 
TiPS : Trailblazing Percolation Scheduling [16] - a speculative code motion 
technique. Instruction Selection, Register Allocation and If-Conversion - a technique 
for architectures with predicated Instruction Sets. The back-end generates assembly 
code for the processor ISA. We use SIMPRESS [11], a cycle accurate, structural 
simulator to analyze the performance of generated code. 
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Figure 2. EXPRESS Framework 



3.3 Transmutations Framework 

Mutation Scheduling (MS) attempts to couple the phases of Instruction Selection and 
Register Allocation into the ILP Scheduler by providing semantically equivalent 
computations of program values that have different resource usage patterns. MS 
adopts a ’’local” view of the search space by only providing for mutations of values 
through algebraic transformations. Transmutations incorporates MS and also 
provides a framework for phase-ordering between all transformations, including the 
traditional compiler optimizations and memory optimizations. Furthermore, 
Transmutations attempts to customize the compiler for a wide variety of architecture 
styles including RISC, VLIW and Superscalar. 

Through the Transmutations framework, the EXPRESS compiler is able to 
dynamically "adapt” both the program code and the order of transformations based 
on the resource availability and program region characteristics. Examples of code 
mutations [18] possible in EXPRESS include architecture-independent mutations 
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such as Tree Height Reduction (THR), and architecture-specific mutations such as 
Strength Reduction, Synonyms etc. Each code mutation has a cost function that 
determines its impact on performance, code size, memory access etc. The heuristics 
in the Transmutations framework use this information in order to assign priorities for 
the mutations based on the resource availability. The heuristics also determine the 
ordering of the compiler phases. Transmutations also allows for user-guidance 
through the Transmutations Control Script as shown in Figure 2. In the script, the 
user can specify new mutation transformations, strategies for phase orderings, and 
also specify the heuristics and cost functions. This allows the user to customize the 
compiler based on the application and processor domain. 

4. Experiments 

We conducted some experiments to demonstrate the importance of customizing the 
optimization flow based on the architecture, the application and the design goals. 
While EXPRESS supports various phase orderings of all the optimizations, in this 
paper we focus on two very important transformations: If-Conversion and 
Speculative code motion. We performed experiments with ordering these two 
transformations along with other conventional optimizations such as Dead Code 
Removal. 

If-Conversion is a technique for converting control dependent operations into 
conditionally executed operations. This technique is very useful for predicated 
architectures that allow for conditional execution of operations based on the value of 
a Boolean source operand, referred to as the guarding predicate. If-conversion 
eliminates the branch instruction and converts control dependencies to data 
dependencies. As a result, the true and the false branch basic-blocks of a if-statement 
get merged into a larger basic-block with greater parallelism. However, If- 
Conversion may increase the number of instructions that get executed dynamically 
because instructions from both paths of the branch get executed. Furthermore, 
depending on the architecture, If-Conversion may also increase the code size. We 
consider two architectural choices in supporting predication : restricted (also known 
as Partial Predication) and aggressive (also known as Full Predication). In the 
restricted version only a limited set of predicated instructions are available in the 
ISA. In our experiments, the Partially Predicated architecture only supports 
conditional moves, while the Fully Predicated architecture supports the guarded 
execution of all operations. 

The speculative code motion technique in EXPRESS is based on the TiPS 
(Trailblazing Percolation Scheduling) technique developed at UC, Irvine. TiPS is a 
beyond basic block scheduling technique that attempts to extract the maximum DLP 
available in the application. TiPS has been proven to extract good performance while 
limiting the code size explosion associated with most speculative code motion 
techniques. 

The simulation architecture platform is a MIPS variant with 2 ALUs, a Float Unit, 
a Branch and a Load/Store unit. It accepts the MIPS ISA, and also supports both 
Partial and Full Predication. We assume that the latency of each operation is 1 cycle 
and the branch mis-prediction penalty is 4 cycles. We chose the MIPS as our 
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experimental platform because of the wide variety of architecture styles with the 
same ISA. The MIPS R4000 is RISC, while the MIPS RIOOOO is superscalar with 
ILP and the R12000 supports Partial Predication with conditional moves. The 
demonstrator benchmarks are control-intensive kernels with nested-if structures. 
These benchmarks have been chosen from the Trimaran [21] suite, and also from 
scientific computation benchmarks. 

Table 1 : Phase Ordering for Partial Predication 



Bench 


Partial Pred. | 


Pred. 


Spec. 


Pred. & Spec. 


Spdup 


Size 


Spdup 


Size 


Spdup 


Size 


Dag 


1.00 


1.00 


1.4 


0.94 


1.26 


1.00 


Ifthen 


1.00 


1.00 


1.22 


0.95 


1.28 


1.00 


Hyper 


1.00 


1.00 


0.98 


1.00 


1.17 


1.00 


Minloc 


1.00 


1.00 


1.12 


1.10 


1.32 


1.00 



Table 1 presents the speedup and code size obtained on the Partial Predication 
model. The second and third columns present the normalized speedup and code size 
(respectively) after If- Conversion alone. The next two columns present the speedup 
and code size for Speculation alone as compared to If-Conversion. The last two 
columns present the speedup and code size obtained by performing Speculation after 
If-Conversion. As can be seen from the table, the optimal ordering of these 
transformations is dependent on the application and also the compilation goals. For 
example, for the ifthen benchmark. Speculation alone performs comparably to 
Predication followed by Speculation and at the same time has lower code size. This 
is because, in the partial predication model, a lot of conditional moves are inserted 
during If-Conversion. This contributes both to the code size and to reduced 
parallelism. However, for the minloc benchmark. Speculation suffers from code size 
explosion and lower performance as compared to Predication and Speculation. There 
is no difference in code size with Predication alone as compared to Predication and 
Speculation because Predication converts Ifs to straight line code and thus prevents 
code explosion during the Speculation phase. 



Table 2: Phase Ordering for Full Predication 



Bench 


Partial Pred. | 


Pred. 


Spec. 


Pred. & Spec. 


Spdup 


Size 


Spdup 


Size 


Spdup 


Size 


Dag 


1.00 


1.00 


1.26 


1.09 


1.15 


1.00 


Ifthen 


1.00 


1.00 


1.11 


1.08 


1.32 


1.00 


Hyper 


1.00 


1.00 


0.98 


1.06 


1.18 


1.00 


Minloc 


1.00 


1.00 


1.15 


1.08 


1.23 


1.05 



Table 2 presents the speedup and code size obtained on the Full Predication 
model. We discern some slight variations to the speedup and code size numbers as 
compared to the Partial Predication model. In particular. Speculation alone always 
results in a code increase of 6 - 10% as compared to Predication alone. This is 
because If-Conversion does not insert any conditional moves and instead chooses to 
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convert the conditional operations into their predicated counterparts. Once again, 
however, the optimal ordering of these phases is very much dependent on the 
application and the design goal. The EXPRESS customizable compiler, which allows 
for dynamic ordering of the phases, is very useful and can provide significant 
advantages over predetermined static phase orderings. 



5. Summary 

Software generation for embedded systems is very complex because of the wide 
variety of architectural styles, diverse application domains and design goals. In this 
paper we present a customizable retargetable compiler framework that determines 
the phase-ordering between transformations dynamically based on the resource 
availability and the program region characteristics. We present some experiments 
with ordering If-Conversion - a predicated execution technique, and Speculative 
code motion. The results indicate that flexibility in the ordering of the 
transformations is important while compiling for embedded systems. Our future 
work includes performing experiments exploring the various phase orderings, and 
also incorporating more transformations into the EXPRESS compiler. 
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1. Introduction 

Designers of today’s telecommunication products such as cellular phones, wireless, 
and networking devices are facing a rapidly growing system complexity. Driven by 
the advances in semiconductor technology and the need for new telecommunication 
applications, the amount of functionality that is realized on a SOC is increasing 
enormously. The architecture for these applications are truly heterogeneous 
multiprocessor including hardware/software, and digital/analog parts. 

The growing complexity of these systems requires that design tools and methods 
work at increasingly higher levels of abstraction. Abstraction helps manage 
complexity because systems can be specified by their behavior instance of their 
structure. New methods are needed to handle the design and validation of these 
systems where different languages, tools and models need to be used for full system 
specification, simulation and design. Systems must be described and specified 
formally if they are to be analyzed or manipulated by other design tools. Simulation, 
i.e., execution of specification, helps users validate a specification with respect to 
both functionality and properties. Designs can be synthesized and transformed from 
abstract specifications into physical implementations. Verification tools analyzed 
designs and requirements and provide assurance that designs have been properly 
implemented and that they will always work correctly. 
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This chapter presents a methodology and framework for system design, validation 
and fast prototyping of advanced telecommunication multiprocessor System-On- 
Chip. The presented methodology takes into account multi-languages functional 
specification, multi-level validation, both algorithm and multiprocessor architecture 
explorations, refinement, software and hardware targeting, and final prototyping 
generation. The framework combines a different technologies such as Matlab™, 
SDL and codesign tool, as well as other related tools, to support all development 
cycle. A case study on real world communication device application demonstrates 
the effectiveness of the methodology and how the framework, was applied to this 
real industrial product to increase the productivity and the quality of the SOC design 
effort. 

The remainder of this chapter is organized as follows. Section 2 gives an 
overview of our global system-level design methodology. Section 3 presents the 
framework for system validation and prototyping of digital systems. Section 4 
describes the specification and design of 2.4 kbits/s LPC vocoder and highlights the 
results obtained. Section 5 discusses the results and presents the lessons learned. 
Section summarizes the chapter. 



2. Overview of a generic methodoiogy 

The methodology starts from a high-level specification describing the functionality 
of the each subsystem as a hierarchical mixed data/control flow diagram. Note that 
description already can contain components from a reuse library. The design process 
starts with the designer creating a functional specification of each subsystem. The 
aim is to validate and explore the algorithms and functionality of system by system- 
level simulation (i.e. co-simulation) including the validation of the individual 
subsystem, and the full system. The various algorithms may be explored through 
simulation. Therefore, once the system functionality and algorithms are validated by 
system-level simulation (i.e. co-simulation), we use, in addition, a back-annotation 
approach. At this stage we obtain system-level specification associated to the back- 
annotate performance models. The aim is to explore different architectures through 
the system-level simulation. Thanks to this new time-annotated specification, it is 
possible to predict the performance of all feasible architecture solutions with a good 
trade-off between speed and accuracy. Then once an architecture is decided, the 
functional specification is mapped into an architectural specification. The digital part 
of the system is partitioned into hardware and software components. Other 
subsystems also may be refined. The architectural specification, may include one or 
more processors and others components. At this stage we obtain a multi-processors 
architecture without technology. The aim is to validate through the fast co- 
simulation, thanks to the abstract model of architecture. Once an architecture is 
validated, the targeting will proceed from the top level SoC architecture and the 
individual components contained on the SoC. In this stage, the architectural model is 
more detailed and the cycle-true validation is obtained through slow co-simulation. 
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3. Framework for system validation and prototyping 

In order to support all development cycle of telecommunication systems, the 
framework combines a different technologies such Matlab and SDL languages, co- 
design tools, performance estimator and co-simulation environment. The framework 
start from Matlab and produces a heterogeneous multiprocessor system-on-chip. As 
shown in Figure 1, the design process in this environment contains four major parts: 
powerful system-level specification environment based on Matlab, codesign starting 
from SDL (including design space exploration, refinement steps and functional 
architecture generation), multiprocessor SoC generation (including prototyping, 
interface synthesis, RTOS generation and architecture mapping), and co-simulation 
environment which provides analysis and simulation for each of the design models. 




Figure 1 : Framework for system validation and prototyping of digital systems 
3.1. Algorithm simulation-exploration platform 

Many telecommunication systems consist of signal processing and control dominated 
parts. In order to specify efficiently the full functionality of the system, we propose 
to use two Matlab and Stateflow. The signal processing part is effectively modeled 
using a Matlab whereas a Stateflow is better suited for the control logic. Matlab is an 
intuitive language technical computing environment which provides core 
mathematics, engineering functions and advanced graphical tools for data analysis, 
visualization, and algorithm and application development. It is very suited for 
representation of data flow oriented design. Stateflow provides a key solution for 
designing the control or protocol logic found in embedded system. Simulink is a 
simulation environment which provides a block diagram interface that is built on the 
core Matlab. It also provides an elegant solution to integrate Stateflow blocks for 
designing the control or protocol logic found in embedded system. 

Algorithm simulation-exploration platform is an integrated product suite that 
combines algorithm design, block diagram simulation, code generation, and analysis 
in a single, interactive environment. Based on Matlab, Stateflow and Simulink, the 
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platform shortens development cycles and reduces the risk of design errors by 
solving many of the engineering problems that he encounter every day, including 
ambiguous design specifications, reducing the burdensome amounts of manual re- 
coding, changing the models from one design tool to another and coping with the 
expensive delays and reiterations caused by design errors. It enhances designer’s 
ability to investigate new research ideas and design custom solutions to complex 
problems. 

The first step a designer might take in creating a high-level executable 
specification of telecommunication system using simulation-exploration platform is 
to model the algorithms as a hierarchical mixed data/control flow diagram. The 
second step is to find the most appropriate algorithm for each part of system and to 
validate the full system specification. Assuming that the model behaved as expected, 
a designer would now choose the appropriate architecture and whether to implement 
the system in hardware or software. These issues will be solved in the following 
sections. 

3.2. Architecture exploration, refinement steps and code 
generation 

In order to use the SDL-based codesign tool and architecture exploration approach, 
the initial high-level executable specification will be transcribed to SDL. 

3.2.1, Architecture exploration. The architecture exploration technique is used 
here to provide feedback, which guides the search for good architectural solutions. 
The main issue is to find the most appropriate multiprocessor architecture, including 
determination of the right partition between hardware and software components and 
selection of the right components, as well as the best communication protocols. The 
approach combines both system and RT levels, and uses a hybrid analytic/dynamic 
model, which gives a good trade-off between speed and accuracy. The architecture 
exploration approach makes use of the system-level simulator and the 
hardware/software codesign tool. Figure 2 shows the four steps of this approach: 
calculation of the basic delays, back-annotation, selection of the architecture and 
simulation. 
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Figure 2. Architecture Exploration approach. 
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3.2.2.refinements steps and code generation. The process starts from the system- 
level specification language SDL to produce a functional prototype architecture 
composed of hardware and software components. The required system functionality 
is specified in SDL. The second step consists on hardware/software partitioning. The 
partitioning represents the mapping of functional subsystems onto abstract 
processors. The third step consists on communication refinements including protocol 
selection and primitives insertion. The next step examines design alternatives to 
identify those that meet the system constraints and the architectural choices are made 
with the assistance of the architecture exploration technique. Then the codesign tool 
carries out the generation of the functional prototype (including C and/or VHDL 
codes generation) which is validated through co-simulation. 

3.3. Targeting and multiprocessor SoC generation 

The targeting will proceed from the top level SoC architecture and the individual 
components contained on the SOC. The software is partitioned into low-level device 
drivers interfacing with hardware components, a real-time operating system (i.e. 
RTOS) and protocols stacks, and application software. The main step here is the 
design of the communication. The communication interfaces and interconnect 
components is realized in dedicated hardware. The SoC architecture may contain 
memory model, processors and peripherals, and application specific cop-processor. 
In this stage, the architectural model is more detailed and the cycle-true validation is 
obtained through slow co-simulation. For processors, a BFM will be used in 
conjunction with an instruction set simulator (ISS). The hardware (RTL-VHDL) will 
be executed by VHDL simulator (VSS). The cosimulation environment ensures the 
synchronization of the running simulators for coherently execution of the overall 
system. 



4. Case study 

To demonstrate the efficiency of the fast prototyping system for telecommunication 
applications, we designed a vocoder system. The most prominent vocoder standard is 
the U.S. Government Linear Predictive Coding vocoder (LPC-10) standard which 
operates at 2.4 kbps and has been widely used in the military and wireless 
communication. The following sections describe the basic principles and key 
features of LPC- vocoder, including functional structures and signal-processing flow 
in each major module as well as the overall design/validation cycle of LPC- vocoder 
2.4 kbps. 

The model assumes that speech is the result of exciting the linear time-varying 
filters (including the LPC filter, and the pitch filter) with a source signal. The 
excitation source signal is modeled as either a periodic impulse train for voiced 
speech like vowel sounds, or a random noise for unvoiced speech like consonants. 
Figure 3 shows the functional block diagram of the vocoder. The vocoder is 
composed of two major parts: transmitter and receiver where each part contains 
several algorithms. In the transmitter, the original speech is partitioned into 20-ms 
frames, each consisting of 160 samples at a sampling rate of 8 kHz. The transmitter 
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analyzes the original speech, and extracts a set of parameters that represent some 
kind of source-filter model. These parameters are then transmitted out to the 
receiver, where a synthesizer reconstructs the speech based on the received 
parameters. 




Figure 3. The functional block diagram of the LPC 2.4-kbps vocoder 

The simulink specification of LPC vocoder is shown in Figure 4(a). In 
transmitter, as shown in Figure 4(b), each firame of input speech goes through a high- 
pass filter and a Hamming window filter, before LPC-analysis filter algorithm is 
performed. The goal of LPC-analysis filter is to search for a set of optimal filter 
coefficients, in the sense of the least mean squared residual error. In the LPC 
filtering, the auto-correlation coefficients of input speech are calculated, then 
Durbin’s recursion algorithm is used to compute 10 optimal LPC coefficients and 
gain. After LPC analysis, the next step is pitch search which performed by the pitch 
detector block including pitch period computation and voiced/unvoiced speech 
detection. Figure 4(c) shows the receiver simulink specification. The excitation block 
building impulse train according to the pitch period (when the signal was voiced 
speech) or randomly period (when the signal was unvoiced) as noise, this impulse 
train get in to the inverse filter that produced according to LPC coefficient and gain. 
The LPC inverse filter output the reconstructed sample. 

4.2. LPC-Vocoder simulation-exploration platform 

Figure 5. shows the simulation-exploration platform of LPC-vocoder based on 
Simulink environment. It consists of exploring Algorithms including algorithm 
development, analysis and validation. In our case, we have tried 3 pitch extraction 
algorithms including auto-correlation, center clipping and sift algorithms. The model 
execution has three phases. The reset phase, parameter values are obtained and 
checked and the model is initialized. In the main phase of execution, the model reads 
and writes data. In the final phase, a wrap-up occurs, for freeing resources, writing 
final results, etc. However, all instances are reset at the beginning of simulation run, 
and all are “wrapped up” at the end. 

In this platform we can either record our own voice via microphone or load 
samples of prerecorded speech. The next steps are the LPC and the pitch analysis. 
Both, the set of LPC coefficients and the pitch values are then stored in the 
parameter memory. These parameters are needed to control the synthesis part of the 
vocoder which is shown in the lower part of the diagram. 
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Figure 4. LPC 2.4kbps-Vocoder Simulink specification 

The pitch values (pitch contour) and the number of prediction coefficients can be 
changed and these changes have a significant influence on the reconstructed speech. 
We can replay the signal and compare it with the reference speech signal. For visual 
comparison the reference speech signal and the reconstructed speech signal are 
depicted in both the time and frequency domain. For the reconstructed signal, also 
the pitch frequency contour is graphically presented and we can directly manipulate 
this contour. The main advantages of the simulation-exploration platform are its 
numerous interactive functions. 




Figure 5. Simulation-exploration platform of LPC-vocoder 
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4.3. Transcription of the Simuiink specification in SDL 

During the previous phase, the simuiink specification of vocoder with was carried 
out and validated. We point out that the refinement steps in our framework support 
only the SDL specification. In order to proceed to codesign and architecture 
exploration phases, the initial specification is translated into SDL. We have adopted 
a systematic method to translate the initial specification into SDL, in order to 
preserve the same structure scheme. Each simuiink block is decomposed into SDL 
block or SDL processes. We have obtained the same result between SDL and 
Simuiink specifications including the blocks hierarchies and sub-blocks. The only 
difference appears on the SDL processes. This is shown in Figure 6, were the 
terminal blocks of simuiink are translated into processes. The validation of this 
specification is performed using SDL simulator. 




Figure 6. SDL specification of LPC 2.4kbps Decoder 

4.4. LPC-Vocoder steps refinement and code generation 

The next steps make the refinement in this new system specification. In the first step, 
we have translated the SDL specification into an intermediate format containing an 
extended FSMs, in order to apply system refinement transformation. This translation 
is performed automatically. We have used, then, the interactive partitioning to 
achieve the different transformations such as the reordering blocks and processes 
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within the hierarchy and merged into single block or process. We have chosen to 
implement the all vocoder modules into software. Then we have generated six C 
code blocks (i.e. processes). We have assigned two processes to one abstract 
processor and four processes to an other one. After this step, it becomes possible to 
make a cosimulation of this functional prototype, in order to validate the above 
refinement steps. 

4.5. LPC-Vocoder SoC generation 

We chose two kinds of processors: ARM7 micro-controller and TMS30C31 DSP. 
The partitioning and the assignment of processors for this architecture is illustrated 
by the Figure 7. The analysis filter and The inverse filter blocks are implemented in 
software on TMS30C31 DSP. The voice handset and the communications protocol, 
pitch detector, the voiced/ unvoiced decision, and the excitation generator blocks are 
implemented in software on ARM7. The communication interfaces and interconnect 
components is realized in dedicated hardware. The virtual prototype validation is 
obtained through cycle accurate cosimulation. For processors, a BFM has been used 
in conjimction with an instruction set simulator (ISS). For hardware, RTL-VHDL has 
been used and executed by VHDL simulator (VSS). Figure 8 shows the cycle 
accurate cosimulation of the LPC- Vocoder system. 




Figure 7. The SoC prototype of the LPC-Vocoder. 



5. Result analysis and lessons learned 

Table 1 summarizes the simulation and co-simulation time and design facilities. The 
Simulink description including algorithm exploration and system level simulation of 
LPC-Vocoder took less then 1 weeks. While SDL specification and validation took 
less 3 weeks. The difference is due the lake arithmetic operations to perform the 
signal processing functions in SDL. The architecture exploration and prototype 
generation and validation took less then 1 week. Considering the simulation speed, in 
the system-level specification, the simulation time of the Simulink model is 5 time 
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faster than SDL specification. The difference is mainly due to the overhead 
procedures dedicated to compute the mathematical functions. The co-simulation of 
the functional prototype is 24 time faster than cycle accurate cosimulation. This 
difference demonstrates clearly the benefit of the validation phase of the functional 
prototype. The simulator of the TMS320C30 runs at speed of Imega cycles per 
seconds and the ARM? runs at a speed of 1 .4 mega cycles per seconds. 

This experiment shows that Simulink is very suited for the system-level 
specification and algorithm exploration of DSP applications (i.e. easy description 
and faster simulation, powerful algorithm exploration, thank to Matlab toolbox). 
However, the main restriction now, is the lack of design automation tools starting 
from Simulink. From the SDL point of view this experiment shows clearly the 
efficiency of the SDL-based design tool including architecture exploration. However, 
the experiment shows also that is very hard to describe the DSP application in SDL, 
where the set of predefined arithmetic operation is quite restricted Of course, SDL 
remains is very suited for the specification of telecommunication protocols. 
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Table 1. Simulation and co-simulation time of LPC-Vocoder. 




6. Conclusion 

In this chapter, we have presented a methodology and framework for system 
design, validation and fast prototyping of advanced telecommunication 
multiprocessor System-On-Chip. The presented approach takes into account 
system-level specification, multi-level validation, both algorithm and architecture 
explorations, refinement, functional architecture generation software and hardware 
design, and final prototyping generation. Our design framework combined a different 
technologies such as Matlab, SDL and codesign tool, as well as other related tools, to 
support all development cycle. The originality of this approach is mainly the 
combination of an algorithm and architecture explorations in the single design flow. 
We have also presented the design and validation of the 2.4 kbps-LPC vocoder using 
our framework. The successful design and result of this case study demonstrated the 
effectiveness of our approach. 
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The design of embedded systems has to address several 
interacting design aspects, so-called dimensions, to capture 
parallelism, distribution over different locations and hard real- 
time requirements. Thus, a structured design process has been 
established with the PARADISE design environment. The design 
process covers all steps from behavioral specification to final chip 
realization. In this paper, we describe how system specification 
and refinement is covered in combination with the processes 
available in PARADISE. An example of an adequate specification 
and modeling language is considered and adapted for integration 
into PARADISE. First results show the feasibility of integrating 
the respective concepts. 



1. Introduction 

The design of embedded systems (ES) today has to address several interacting 
design dimensions to implement parallelism, distribution over different locations, 
and hard real-time (RT) requirements. Consequently, the modern, structured design 
process has to start with a system specification, and it has to deal with heterogeneous 
requirements and restrictions. Especially the integration of dedicated flows for HW- 
design and SW-design as well as RT-operating system issues into the ES design flow 
is a main challenge. The PARDISE^ design environment provides tools for 



^ Design Environment for PARAllel, Distributed Embedded real-time systems 
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behavioral specification, run-time analysis, and system synthesis. Additional needs 
are the evaluation of application-specific characteristics at the system specification 
level. PARADISE is an open design environment for parallel, distributed embedded 
real-time systems. The basis of this common design environment is a highly 
structured design view called P-chart. 

This paper shows the extension of PARADISE with the specification language 
SpecC in order to support systematic refinement from a system specification. SpecC 
is a system-level design methodology and specification language developed at the 
University of California, Irvine [Ga+00]. 

Section 2 gives an overview of the PARADISE design environment followed by 
an overview of SpecC in Section 3. An example, which is implemented in SpecC, 
was chosen to demonstrate the extension of PARADISE (Section 4.). The extended 
design methodology is presented in Section 5. The results for the example are shown 
in Section 6. Finally, Section 7 draws some conclusions and summarizes the paper. 



2. PARADISE Design Environment 

The PARADISE [HaReK199] [Re+00] design environment combines different 
design dimensions. Design dimensions needed for establishing a design methodology 
for today’s ES are: 

• specification 

• modeling 

• analysis 

• verification 

• RT SW-synthesis 

• RT operating systems 

• HW-synthesis 

• rapid prototyping 

Each design dimension covers all levels of abstraction. A very good basis for the 
structuring of the HW-design domain has been suggested by Gajski [Ga88]. The so- 
called Y-chart distinguishes a behavioral, structural, and a geometrical design view. 
Design views are applied to different abstraction levels. The Y-chart differentiates 
between five hierarchical layers: algorithmic, register-transfer, gate, symbolic layout, 
and electrical layout layers. For testability, a test view has been introduced as a 
fourth design view leading to the X-chart [Ra89]. The abstraction of this basic 
structure leads to the P-chart design view first presented in [HaReK199]. The P-chart 
design view applies the X-structure to each design domain separately. Based on this 
abstract structure, domain-specific methods and tools can be integrated. The abstract 
P-chart structuring is illustrated in Figure 1. 

The different layers of abstraction (algorithm to layout) of the Y-chart are 
depicted for each of the eight design dimensions. In addition, each dimension is 
structured by the four views of the X-chart. Based on this concept, a variety of 
automation tools for different design domains and levels have been integrated within 
the PARADISE design environment. Thus, PARADISE can be understood as an 
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implementation of a general applicable, integrated design methodology for today’s 
ES. A special Internet-based communication service allows remote access to each 
tool in a distributed environment [Astair99]. 



specification 




Analysis 




SW-Synthesis 

Operating- 
System 



Rapid Prototyping 



Figure 1 . PARADISE design environment 



3. SpecC Language and Methodology 

Due to the increasing demand for analysis and evaluation of application 
specifications on the system level, the SpecC specification language has been 
introduced into the PARADISE design environment. SpecC allows to analyze 
functional aspects of a system level application specification through simulation or 
rapid prototyping in a very early design phase. The usage of SpecC within the 
PARADISE design environment accelerates the design process, which is important 
in order to keep today’s time-to-market limits. 

The SpecC specification language satisfies all the requirements for a codesign 
language and supports structural and behavioral hierarchy, concurrency, state 
transitions, exception handling, timing and synchronization in an explicit and 
orthogonal way. SpecC encourages reuse and supports integration of IPs. Since 
SpecC is a superset of ANSI-C, a large library of already existing algorithms can be 
used directly. A system design modeled in SpecC is executable, modular and 
complete. As a result, SpecC fulfills all of the requirements of a system specification 
language for the specification and modeling design dimensions on the different 
levels in PARADISE. 
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The SpecC-based design of complex systems, for example SOCs, is the process of 
implementing a desired functionality using a set of physical components. This 
process must begin with a specification of the desired functionality. The SpecC 
design methodology [Ga+00] starts with an executable specification as shown in 
Figure 2. This initial specification model describes the functionality as well as the 
performance, power, cost and other constraints of the design. The specification does 
not make any premature allusions to implementation details. During the specification 
of the desired functionality the designer has the ability to reuse existing code 
segments, functions or procedures by instantiating them out of an algorithm library. 



Synthesis flow 



Validation flow 




Implementation 
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compila- 
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Figure 2. The SpecC methodology 

The system-level synthesis flow of the SpecC design methodology consists of two 
major tasks: architecture exploration and communication synthesis. Through a series 
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of well-defined steps the initial specification is gradually mapped onto a target 
architecture. Architecture exploration, which refines the specification of the design 
into an architecture model, includes the design steps of allocation of processing 
components and busses, partitioning of behaviors, communication channels and 
variables, and scheduling. 

The next step in the design flow is communication synthesis, which refines the 
abstract communication between behaviors in the architecture model into an 
implementation over the wires of system busses. The task of communication 
synthesis includes insertion of communication protocols, synthesis of interfaces and 
transducers, and inlining of protocols into synthesizable components. In the resulting 
communication model, the communication is described in terms of actual wires and 
timing relationships as described by bus protocols. The communication model, 
which is the resulting output from the system-level design process, describes the 
system design. It models the mapping of the specification onto components from the 
architecture model enriched by information of the communication structure and 
communication protocols. 

The result of the synthesis flow is handed off to backend tools for compilation 
and high-level synthesis, as shown in the lower part of Figure 2 (back end flow). 

In the following sections, we describe the integration of SpecC into the 
PARADISE design environment and the resulting new design methodology. 



4. Example 

To demonstrate the extension of the PARADISE design environment with SpecC, 
we choose an application example. The voice encoder/decoder {vocoder) which is 
implemented in SpecC is part of the European GSM standard for mobile telephone 
networks. The lossy codec scheme was originally developed by Nokia and the 
University of Sherbrooke [Ja97] and is based on widely used algorithms for speech 
encoding [Sa98]. The so-called Enhanced Full Rate (EFR) speech transcoding is 
standardized by the European Telecommunication Standards Institute (ETSI) as 
GSM 06.60 [ETSI96]. 

The GSM 06.60 standard for the EFR vocoder is accompanied by a bit-exact 
reference implementation of the vocoder functionality consisting of 13,000 lines of C 
code. This code describes the required functionality and was therefore used as the 
basis for the SpecC specification. At the top level the vocoder consists of 
independent coding decoding behaviors running in parallel. Encoding and decoding 
transform a stream of speech samples at a rate of 104 kbit/s into an encoded bit 
stream with a rate of 12.2 kbit/s, and vice versa. Coding is based on a segmentation 
of the incoming speech into frames of 160 samples corresponding to 20 ms of 
speech. For each speech frame the coder produces 244 encoded bits. 

The SpecC block diagram of the encoding part is shown in Figure 3. Only the first 
levels of the behavior hierarchy of the encoding part are shown. All together, the 
SpecC description of the vocoder contains 43 leaf behaviors. At the top level, pre- 
filtering and framing, speech coding, and bit serialization run in a pipelined fashion. 
At the next level, the first step in the coding process is an extraction of linear- 
prediction filter parameters. Each frame is then further subdivided into subframes of 
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40 samples (5 ms). In two nested loops, open- and closed-loop analyses of pitch filter 
parameters and an exhaustive search of a predefined codebook are performed, 
followed by a filter memory update step. For a detailed information about the 
implementation of the vocoder in SpecC the reader is referred to [GZGH99]. The 
SpecC source code of the specification, architecture and communication models can 
be downloaded from the SpecC web page [SC]. 




Figure 3. The vocoder encoding part 



5. Design Methodology 

The PARADISE design methodology is based on design dimensions and the 
structuring of each dimension by the P-chart (see section 2). SpecC introduces some 
new features to the design environment and the underlying methodology. Figure 4 
shows the integration of SpecC into the P-chart based design process as 
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implemented within PARADISE. SpecC links the specification domain (left in Fig. 
4) on the system level to the modeling dimension (right side in Fig. 4). The 
specification model is stepwise refined as presented in section 2. The designer of a 
complex ES compiles each SpecC model into an executable description. The 
simulation executable is used for prototyping and validation on the corresponding 
level. Once a model is validated, the designer passes it to the analysis design 
dimension within the PARADISE design environment. For example, the tool 
CHaRy^ is used for timing analysis of the SpecC model in the analysis dimension, as 
depicted in the lower part of Fig.4. 

CHaRy [A196, A197, StA197] is a software synthesis tool for periodic controller 
applications. CHaRy allows to guarantee hard real-time conditions. Due to 
complexity reasons, CHaRy decomposes the overall problem of implementing 
periodic controllers on parallel embedded computers to the sub-problems 
partitioning, timing analysis, allocation, and schedulability analysis. Since all these 
sub-problems are of high complexity, CHaRy provides efficient heuristics for all 
these subjects. Hence, CHaRy supports the mapping of controller models to a 
number of tasks (partitioning), the extraction of their computation times (timing 
analysis), and the assignment of tasks to a processor network (allocation), such that 
all hard-real time conditions are guaranteed (schedulability analysis). 
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Figure 4. SpecC in PARADISE 

The communication model of the SpecC design flow is the result of the system- 
level synthesis process, describing the structure of the system in terms of system 
components connected via system busses. High-level synthesis of the custom 

^ C-LAB Hard Real-Time System 
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hardware components in the communication model is handled within the HW- 
synthesis design dimension. On the behavioral level, the PMOSS^ system can be 
used. The PMOSS system is a powerful platform for high-level synthesis of 
embedded hardware out of a behavioral description. PMOSS divides the design 
process into several design tasks (e.g. HW/SW-partitioning, data-flow analysis and 
scheduling) which in turn may be subdivided into subtasks. Based on a well- 
structured database, at least one algorithm is available for each task or subtask which 
reads the database an writes all results back to the same database. Thus, subsequent 
tasks have immediate access to all the results. The same concept holds for data input 
which can be imported from different languages and for data output at the interface 
to commercial and public-domain tools for logic synthesis. For detailed information 
see [Ha95]. 



6. Results 

In the following, we describe the results for some parts of the vocoder example. The 
closedjoop part of the vocoder model was analyzed with CHaRy. At the top-level 
the closedjoop is split into five different tasks (Figure 3): impulse response, target 
signal, pitch delay search, code vector computation and pitch gain calculation. 
CHaRy analyzed the worst-case execution time (WCET) for each of this task. The 
target architecture was a PowerPC. With CHaRy, we obtained the following WCET 
estimates for each individual task in terms of delay in ps and number of machine 
cycles as summarized in Table 1. The overall run-time for the closedjoop is 89854 
ps and 8064786 cycles. 



Task from closedjoop 


MS 


cycles 


impulse response 


4595 


454261 


target signal 


11853 


1085458 


find pitch delay 


60731 


5432123 


compute code vector 


10747 


923495 


calculate pitch gain 


1796 


156312 



Tablel . WCET for closedjoop tasks 

The LP analysis (short term analysis) was analyzed with the high-level synthesis 
tool PMOSS. For our example, we use the high-level transformation and synthesis 
process from PMOSS. The SpecC description of the LPjanalysis was transformed 
into a data-flow graph (DFG) and a control-data-flow graph (CDFG). Furthermore, 
PMOSS generates a controller and a datapath. The LPjmalysis function is optimized 
by several high-level transformations. In fact, high-level design space exploration is 
enabled by the transformation task. Available high-level transformations include 
loop-unrolling, constant propagation, dead code elimination, elimination of 
temporary data elements as well as algebraic transformations. High-level 
transformations result in an optimized high-level description of the LP analysis. All 
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examined points of the design space can be visualized for feedback to the designer as 
transformation graph. The partitioned and optimized LP analysis description can 
now be passed to the synthesis task, as shown in Figure 5. Within this task, the 
synthesis sub-tasks functional unit (FU) scheduling, FU allocation, FU binding, 
register (REG) allocation, REG binding, interconnection and finally netlist 
generation are performed. PMOSS provides several algorithms for each synthesis 
sub-task which allows to direct the optimizations. The DFG of the LPjmalysis has 
24 nodes and the CDFG has 128 nodes. The datapath contains 7 FU’s, 7 registers and 
uses 13 multiplexers. The controller has 25 states and 57 transitions. The register 
transfer level netlist can be stored in different formats, for example VHDL. 




d) 



VHDL 

BLIF 

KISS 



Figure 5. PMOSS synthesis process 



7. Conclusion 

In this paper we presented the integration of the specification language SpecC into 
the PARADISE design environment. SpecC fulfills all requirements for the design 
dimensions specification and modeling within PARADISE. The existing tools 
CHaRy and PMOSS are used for timing-analysis and high-level synthesis with a link 
to logic synthesis. Therefore, the integration of SpecC into PARADISE results in a 
closed design flow from system specification down to implementation. The results of 
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the vocoder example reflects the usability of the presented methodology and the 
design environment. 
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This paper presents three methods for the supervision and 
optimisation of a controller by means of an exemplary 
application. The supervision is used for error recognition, 
analysis, and processing. The basic principle comprises the use of 
different controllers. A robust controller as a fail-safe device and 
a controller to be tested are implemented. In the case of an error 
during the tests the system automatically switches to the robust 
controller. This mechanism is implemented by means of a finite- 
state machine. A precision supervision compares setpoints and 
measured values. The difference is weighted and transformed into 
discrete events which affect the switching between the different 
controllers. Through a spectrum analysis appropriate for real- 
time use, supervision is effected in the frequency domain. The 
results of the computation can be used directly for online 
optimisation. It is shown that in the case of real-time, a 
synchronous computation of the frequency spectrum is more 
useful than an asynchronous one. The results are presented with a 
current application from the domain of railway technology. This is 
a suspension/tilt-module testbed employed with the research 
project "Neue Bahntechnik Paderborn”. 



1. Introduction 

Ever increasing demands on the complexity, features, and possibilities linked with 
mechatronic systems (e.g., operational range, security, comfort, and costs) can no 
longer be met just by using simple adaptive controls. 

The potential inherent in novel circuit techniques and ever more potent 
microprocessors, however, opens up a new dimension for online optimization and 
supervision that will exceed the conventional view of the adaptability (Isermann, 
Lachmann, Matko 1992) of a system (esp. in the domain of adaptive controls). 

^MLaP, Pohlweg 98, D-33098 Paderborn, http://www.MLaP.de 
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The tasks relating to online supervision and optimisation can generally be 
structured as follows: 

□ Supervision of the hardware, e.g., linking of the sensors and the state of the 
processor. 

□ Improvement of the controller function by an optimisation. 

□ Supervision of the software to ensure controller stability and precision. 

HiL testbeds to test mechatronic components are mechatronic systems as well. 
Here manual or automated tests and optimisations are performed. In most cases the 
level of the electrical, the mechanical and the hydraulic power during the tests and/or 
the optimisations is quite high. The risk of damage for man or machine has to be 
minimised as far as possible. Switching off the energy supply is not the best solution 
in every possible case. An automated fail-safe reaction provided by the information 
processing is the better alternative in comparison with (slow) manual interactions. 
That is why the testbed has to be equipped to ensure safety even in cases of operating 
errors caused by the user or software. These experiences are our motivation for the 
work presented. 

We are going to demonstrate a controller switching (from test to robust and back), 
a controller precision supervision, and finally a spectral-resonance supervision for a 
HiL testbed. 



The Design of Mechatronic Systems 

At the Mechatronics Laboratory Paderborn (MLaP) we developed the software 
environment CAMeL (Computer-Aided Mechatronics Laboratory) (Meier-Noe, 
Hahn, 1999). It allows to do studies of the dynamical behaviour that precede and 
then accompany the constructive design (LUckel, Toepper, et al., 1999). 

All components and their arrangement are represented in the computer by means 
of graphical support. Every component of the entire system is described in its 
discipline-specific manner. Thus a thorough description of rigid bodies as elements 
of the mechanical supporting structure, e.g., requires information on the respective 
mass, inertia tensor and position of the points relating to the coupling, application of 
force, and measurement. In a subsequent step this structural representation of the 
mechatronic system by its different discipline-specific descriptions is made uniform 
in a mathematical-symbolical representation. All basic elements are transformed into 
the explicit state-space representation and the subsystems are coupled by their 
input/output relations. Then this mathematical model is transformed into a numerical 
one that the computer can process and evaluate. As a result the behaviour of motion 
is computed, yielding information on time- and frequency behaviour, stability and 
sensibility and indicating if the system does or does not function properly. Now, in 
addition to the geometrical dimensions, other essential physical variables are 
available to the developer, e.g., forces and torques arising in the system, pressures, 
mechanical and electrical tensions. Moreover, the stiffness of the system and forms 
of self-oscillation can be analyzed. 
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The Structure of Hybrid Information Processing 

To supervise or optimise the controllers a discrete component we call operator is 
assigned to each continuous controller. 

This newly introduced entity between controller and operator is named Operator- 
Controller Module (Naumann, 2000). Figure 1 shows the basic structure of an 
Operator-Controller Module. The operator retrieves information from the controller 
and updates the latter if necessary. The update may be limited to simple parameter 
tuning but can even imply switching of the controller structures: 
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Figure 1 . The Operator-Controller Module. 

In this way a hybrid system in the sense of a mixed discrete/continuous system 
comes into existence. But this simple subdivision into discrete and continuous parts 
hardly resembles a powerful concept for the design of complex mechatronic systems. 
Thus, combining the operator-controller module with a modular and hierarchical 
structuring of mechatronic systems is the next step to take (Honekamp, et al., 1997). 

At the lowest level of complex systems we defined the Mechatronic Function 
Modules (MFM). A Mechatronic Function Module consists of a passive mechanical 
frame, sensors, actuators, and hybrid information processing. The MFM can make up 
a hierarchical structure. 

The next level of hierarchy is that of the Autonomous Mechatronic System 
(AMS). An AMS consists of a passive mechanical frame, sensors, and information 
processing. It has no actuators of its own but uses the mechanically coupled MFMs 
for actuation. The highest hierarchical level is that of Crosslinked Mechatronic 
Systems, CMS in short. On this level several AMSs are linked together by a 
communication infrastructure to make up a co-operating system. Co-operating robots 
or automobiles are examples of CMS. 

We have here a highly generalised structuring concept. In this paper we want to 
direct your attention to the supervision of a hardware-in-the-loop testbed for a 
railway suspension/tilt module. 
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Our Example: Testbed for a Railway Suspension/Tilt Module 

Within the project ‘'Neue Bahntechnik Paderborn” (http://nbp-www.upb.de) a 
modular railway concept is currently elaborated that combines modem undercarriage 
technology with the advantages of the Transrapid and the use of existing railway 
tracks (Ltickel, Grotstollen, et al., 1999). 

Its actively controlled suspension/tilt module (Figure 2) realising the spring 
concept for the carriage body of the shuttles serves as an example. The carriage body 
supported by airsprings is damped and tilts by means of a hydraulic base 
displacement at the upper plate. 




Figure 2. The railway suspension/tilt module. 

So the main task of the controller is to provide active damping of the carriage 
body. This can be achieved by generating forces proportional to the relative velocity 
between body and upper plate by means of the actuators. The damping frequency 
range is limited to the bandwidth of the controller. The hydraulic valves are 
controlled by a hybrid information processing (Operator-Controller Modules) 
connected via sensors, amplifiers, and DA/ AD converters. 

To design this module the MLaP disposes of a hardware-in-the-loop testbed 
(Liickel, Liu, et al., 2000). Our testbed includes additional hydraulic actuators to 
generate synthetic forces and displacements of the primary suspension. These 
actuators are not shown in Figure 2. 

The testbed can be seen as an AMS (Figure 3), providing the mechanical, 
hydraulic and informational coupling of the underlying MFMs. These MFMs are the 
hydraulic power supply, the controlled testbed actuators, and the controlled 
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suspension/tilt module itself. The AMS and each MFM contain their own Operator- 
Controller Module. In the following, we will refer to this structure to expound the 
implemented testbed operator and the assessment functions for controller switching. 



Testbed 




Hydraulic Testbed Suspension/Tilt 

Supply Actuators Module 



Physical and mechanical coupling 
— — — — - Information exchange 

Figure 3. Hierarchical structure of the testbed. 



2. Real-time Testbed Supervision 

The testbed operator manages the different testbed states to provide a fail-safe and 
transparent interaction with the user of the testbed. 

This component is located within the operator part on the AMS level of the 
system. For the testbed five states are defined (Figure 4). Within the initial state the 
hydraulic supply is switched off. This state can be reached from all other states when 
the turnOff event is triggered, except for the Fade state. In the Robust Control state 
the hydraulic supply is switched on while a robust controller is active. From there the 
user can switch to the Fade state in which the robust controller and the controller 
under test crossfade automatically. 

In the Test Control state the system supervision can automatically decide to 
switch to the Test Failed state which means a return to robust control. The switching 
is effected by the local MFM operator. The automatic fail test is assessed 
synchronously within the local MFM controller section, resulting in a well-balanced 
computational load. With Test Failed the system can be switched off or a new test 
with altered test controller parameters can be started. 
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Figure 4. State chart of the discrete suspension/tilt testbed management. 



Real-time Actuator Precision Supervision 

To ensure safe operation it is very useful to monitor the precision of the hydraulic 
actuator control. 

This task is located in the respective controller section both of the testbed MFM 
for actuation and of the suspension/tilt module MFM. It is implemented as a 
synchronous dataflow which compares the displacement of the actuator with that of a 
reference model (Figure 5). 
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Figure 5. Dataflow of the actuator precision supervision. 
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Absolute and relative limits can be provided to find out if the limit check 
succeeds or fails. The computed actuator deviation from the reference displacement 
can directly be used as a cost function for online optimisation. The suspension/tilt 
MFM and the testbed MFM operator sections comprise the components to switch 
between the robust controller and the controller under test. 

ReaRime Spectral Evaluation 

To supervise the hydraulic valve sliders in the frequency domain we have done a 
real-time dataflow-based implementation of a discrete Fourier transformation (DFT) 
algorithm. This task is implemented in a synchronous manner (filter) and therefore 
placed in the suspension/tilt controller section. 

Model-based reference signals are compared with measured signals by means of 
computing the least-squares cost function (Figure 6). The spectra comparison can 
directly be used as a cost function for online optimisation. 

The implemented synchronous DFT requires a more complex multiplication 
(0(N)=N^) than an asynchronous one which would be able to massively exploit 
symmetries (Johnson, 1991). For the synchronous DFT filter only standard 
optimisations like the use of register variables and sine/cosine tables are possible. 
For real-time computation, however, there are two major advantages with the 
synchronous solution: 

□ One gets results faster with a deterministic timing behaviour (immediately after 
each DFT sampling period). 

□ The computational load will be well balanced because computation is done 
stepwise with each single sampling step during the DFT sampling period. 
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Figure 6. Dataflow of the spectral supervision of the valve slider. 



Real-time Aspects 

The phase lag between sampling and computation of a digital realisation has to be 
reckoned with because it has a huge impact on the controller bandwidth. 
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One possibility of modelling the phase lag is to use a dead-time transfer function 
(Middleton, 1990). It will show that the dead time of a computation has an effect on 
the phase lag twice as large as that of a sampling. This means that there are two real- 
time conditions for our digital realisation: firstly, not to overrun the sample limit and 
secondly, to minimise the time delay between AD and DA conversions. Only in the 
case of a specification for the maximum phase lag can the second condition be 
described precisely. 
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Figure 7. Optimised evaluation to minimise the phase lag of the controller. 

Therefore it is important to provide an efficient evaluation order within the 
sampling period, which is assumed to be 1000 ps in Figure 7. To minimise the phase 
lag one has to: 

□ delay the effect of the assessment computation to the next sampling period, 

□ start the DA conversion as soon as the required controller output is available. 



Real-time Components 

Other recently developed description elements for our modelling tool CAMeL-View 
are the presented synchronous spectrum analyser, finite-state automata and DA/ AD 
converter elements. 

These description elements serve as predefined and validated components to 
design complex assessment and supervision models. 

For real-time realisation we used our simulation platform IPANEMA (Honekamp, 
1998) which actually has been ported to run with the dSPACE Real Time Kernel 
(Otterbach, Leinfellner, 1999). Figure 8 shows the respective layers. We use the 
dSPACE DS1005 processor board that is based on the Motorola PowerPC 750 series 
and the various dSPACE I/O components. These components are connected to the 
processor board via the so-called PHS-hus. The T5S1005 board is fit for 
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multiprocessing tasks, as is our simulation platform IPANEMA. With a very fast 
optical link called GIGALINK several processor boards can be interconnected: 
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Figure 8. Layers of the real-time hard- and software. 



3. Results 

Examples from the time and the frequency domain were presented to point out our 
ideas on the supervision and support of online optimisation. 

The phase lag problem with digital realisation and our real-time soft- and 
hardware components were shown in detail. But the most important result is that we 
have successfully demonstrated our application on the suspension/tilt testbed under 
hard real-time conditions. The deviation of the cycle time of the application from the 
average is less than 1 % even during the state switching of the discrete operator 
parts. 

Within the Test Control state of the testbed manual an alteration of controller 
parameters could result in a higher deviation from the actuator displacement 
setpoints. Exceeding the limit of 2 mm results in fact in an automatic switch to the 
Test Failed state. In this state the robust controller then successfully reduces the 
deviation to a tolerable value. 

Using an inappropriate controller means that the valve sliders could start oscillating 
at their resonance frequency of about 90 Hz when excited with a step function. This 
would lead to a corresponding peak of the auto spectrum and the state of the 
application in fact would automatically shift to Test Failed, The robust controller 
then stops this oscillation. 

In more general terms, it can be stated that a testbed with a fail-safe layer has been 
built up to protect the testbed from invalid user operations. This fail-safe behaviour 
is also of utmost importance for a safe online optimisation. 
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4. Outlook 

The next step will be to implement not only the cost-functions but a proper online 
optimisation algorithm as a real-time component (Kasper, LUckel, Jaker, Schroer, 
1990). 

We have seen that some assessments (cost-functions) are extremely expensive. 
This can be mended by treating them as separate modules to which processors of 
their own are assigned or which have just a sample rate of their own so that the 
smallest possible latency for the controllers can be obtained. Therefore the next step 
will be to implement operator, controller and assessment as pre-emptive multiple 
tasks with different priorities and sampling rates. Then distributed processing will 
not be out of reach of the multiple tasks because the simulation platform IPANEMA 
and even the DS1005 processor board are already laid out for distributed processing. 
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A PRODUCT FAMILY APPROACH TO 
GRACEFUL DEGRADATION^ 

William Nace 
Philip Koopman 



Design of gracefully degrading systems, where functionality is 
gradually reduced in the face of faults, has traditionally been a 
very difficult and error-prone task. General approaches to 
graceful degradation are typically limited to re-implementation of 
the system for a number of pre-designated fallback configurations. 
We describe an architecture-based approach to gracefully 
degrading systems based upon Product Family Architectures 
(PFAs) combined with automatic reconfiguration. 

A PFA is a region of a system design space populated by different, 
but related, products sharing similar architectures and 
components. Each system instance within a PFA yields a distinct 
price/performance point, and represents a different model in the 
product family. The unifying mechanism that joins PFAs and 
gracefully degrading systems is automatic reconfiguration - in the 
face of a fault, the system reconfigures to a different PFA 
configuration point that optimizes the functionality available with 
the remaining resources. In this process, the system sheds some 
of the non-critical functions that make up such a large percentage 
of modern embedded systems. System designers can also exploit a 
reconfiguration mechanism to provide graceful upgrade and 
unique logistical benefits. The RoSES (Robust Self-configuring 
Embedded Systems) project employs such a reconfiguration 
approach, seeking to create a revolutionary means to build self- 
customizing, distributed, embedded control systems. 



1. Introduction 

Embedded applications such as transportation systems, power distribution, 
telecommunications, construction equipment and weapon systems are moving 
toward highly distributed implementations. As a result, traditional centralized 
approaches are being replaced by federated systems in which many processors 
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collaborate to provide system functionality. This trend certainly is not universal, as 
integration is another common architectural style - especially in avionics [1]. 
However, the very modular nature of such integrated systems allows the same 
concepts to apply within subsystems, as well. If the promise of MEMS 
(Microelectromechanical Systems) devices based on standard semiconductor process 
technology comes to fruition, it will soon be possible for most sensors and actuators 
to have their own inexpensive integrated microcontrollers, accelerating the trend 
toward federated systems. 

A particularly demanding pair of requirements for many distributed embedded 
systems is that they be both inexpensive and dependable. Fortunately, distributed 
systems have an inherent capability to spread functionality across many nodes. 
While it may be that brute-force redundancy is the only way to satisfy stringent 
reliability requirements for critical functions, not every function is critical. In fact, 
much of the increasing computing power in embedded systems provides extra 
functionality or performance optimization rather than basic critical functions. It may 
be acceptable for optimization functions to be shed by a system as components fail, 
so long as this is done in a safe and controlled manner. For example, losing a few 
percent of fuel economy is probably preferable to a complete vehicle failure. 

Thus, there is room in many embedded systems to implement graceful 
degradation of functionality as a way to improve dependability for non-critical (but 
highly desirable) functions. A gracefully degrading system is one in which faults are 
masked and only manifest themselves in a reduced level of system functionality. 

In fact, a few systems implement graceful degradation today, but use labor- 
intensive development techniques that often involve specific engineering efforts for 
every anticipated failure mode [4]. As an example, a car transmission controller 
might be able to substitute for a failed engine controller, but do so with only very 
simple and inefficient engine operation. Such traditional approaches usually 
accomplish graceful degradation using a combination of replication and failover 
algorithms. Alternative approaches include multi-version redundancy and load 
sharing. The former is too expensive for non-critical functionality, while the latter 
usually provides only graceful performance degradation for a fixed set of 
functionality, potentially causing problems in real-time systems. 

We propose that graceful degradation should not be treated as a failover design 
problem, but instead as an exercise in designing a product family architecture (PEA). 
A PEA is a region of a system design space populated by different, but related, 
products sharing similar architectures and components. Each system instance within 
a PEA yields a distinct price/performance point, and represents a different model in 
the product family. The collection of system instances and the relations between 
them form a graph or lattice, an example of which is shown in Figure 1. The concept 
of a PEA is familiar to anyone who has purchased a stereo, computer or automobile. 
However, optimization for product families is typically done assuming a perfectly 
working system rather than with an eye toward graceful degradation. 
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Figure 1 : A PFA Lattice 



Consider a product implemented by assembling dozens or even hundreds of 
different “smart” components (i.e., components incorporating microcontrollers) into 
a fine-grain distributed embedded system. There may be a huge number of different 
product instances possible. And, if a suitable way to allocate functionality can be 
provided, any system in which a single component breaks can be treated simply as a 
closely related system in the PFA that (using a fail-silent assumption) just happens to 
differ in having that failed component missing from it. Thus, PFAs can form a 
conceptual framework for specifying and implementing graceful degradation within 
highly distributed embedded systems. 

RoSES (Robust Self-configuring Embedded Systems) is a new research project 
whose goal is to create architectures for automatic graceful degradation in embedded 
systems. A discussion of RoSES follows in Section 2. Generic reconfiguration 
mechanisms we believe critical to such a PFA framework are discussed in Section 3. 
Section 4 discusses some interesting logistical opportunities made available with a 
good reconfiguration mechanism. We do, however, believe there are some very 
difficult problems with reconfiguration mechanisms that might preclude their 
ubiquitous use. Such problems are explored in Section 5. 



2. Reconfiguration in RoSES for gracefui degradation 

RoSES is a newly created research project investigating a PFA-based approach to 
obtain graceful degradation and other significant benefits, initially on automotive 
applications. The basic concept is to represent a system as a set of: 

• System requirements with associated utility functions (critical functions are 
mandatory; others have various quantified utility levels) that form a lattice 
of acceptable systems, 

• System constraints such as network schedules, or task deadlines, 

• Abstract functional blocks that satisfy various requirements (with associated 
software modules). 
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• Hardware resources, including “smart” sensors, “smart” actuators, compute- 
server nodes, and one or more embedded networks such as a CAN 
(Controller Area Network) bus, and 

• A binding between software (representing a selected subset of functional 
blocks) and hardware that forms a particular point within a PFA space, 
optimizing utility given available resources. 

A RoSES system is a generic runtime architecture that works by providing a 
particular optimum configuration, which involves selecting a subset of possible 
software modules, allocating them the hardware resources, and ensuring that the 
resultant system meets real time constraints without overflowing system size or 
bandwidth limits. In order to match standardized hardware and software components 
to a large variety of system configurations, RoSES uses mobile object adapters. Such 
adapters form a flexible software interface middleware layer between, on one side, 
the basic functionality of the sensor/actuator and, on the other side, a dynamic 
network object interface. The role of the adapters within the RoSES system concept 
is illustrated in Figure 2. 



SMART SENSORS SMART ACTUATORS 




Figure 2: The RoSES System Concept 

Once a particular configuration is established, a component failure (either 
hardware or software) triggers a system reconfiguration. The RoSES reconfiguration 
concept is a fairly fine-grained one, involving specific software modules/objects and 
potentially very small hardware components such as single sensors or actuators. The 
reconfiguration process is, at its core, a search through different combinations of 
mobile object adapters for the sets that can be used on currently available hardware 
resources. Such combinations must be viable (supported by hardware and involving 
available software), must meet critical system requirements, must not violate any 
system constraints, and must provide optimal utility given available resources. 
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Eventually, reconfiguration will be done on-line in real-time, but for now we are 
concentrating on providing reconfiguration as a quick-turn off-line operation to make 
the problems tractable in the near-term. For the car example, ideally reconfiguration 
in response to a component failure is done while driving, but in the near-term we will 
instead assume that the car is pulled to the side of the road before reconfiguration 
takes place, and then can resume operation once a new configuration is established. 



3. The reconfiguration manager 

In a system with an automatic reconfiguration mechanism, graceful degradation 
becomes fairly easy to accomplish. Whenever the failure of a component is detected, 
a new configuration is installed to obtain maximal functionality using remaining 
system resources, resulting in a system that still functions, albeit with lower overall 
utility. Designers using such an approach do not necessarily have to examine each 
combination of faults to specify designated configurations, but rather rely upon a 
generalized reconfiguration engine to deal with any combination of faults as it 
actually happens. 

The RoSES reconfiguration manager is made up of the following abstract 
components: 

• Fault Discovery/System Model - The reconfiguration manager can either 
start with a system model and then cut out pieces whenever it discovers a 
subsystem is faulty (a Fault Discovery mechanism) or it can build a System 
Model from scratch by asking each working component to describe itself 
The concept is the same - the reconfiguration manager must know what 
sensors and actuators are operational before it builds a configuration. 

• Configuration Generator - A means to examine the extremely large search 
space and intelligently choose candidate configurations. To ensure only valid 
configurations are chosen, the Configuration Generator would generate 
candidates from a Dependency Model and filter them with a Validity Checker. 

o Dependency Model - Certain elements of a configuration may require 
or restrict other elements, either by requiring they be present, absent or 
placed in a particular manner. An example of the latter might occur in 
an automobile, where use of a particular braking algorithm would 
require the same algorithm be used on all other brake actuators. Such 
dependencies define the search space from which the configuration 
generator may draw candidates. 

o Validity Checker - Ensures only valid configurations are considered. 
Ensures the configuration would be schedulable, is consistent (e.g. 
consumer algorithms can properly partake of producer data), and 
consumes no more resources than are available. 

• Cost Model - Allows comparison of various configurations. Cost models may 
be fairly complex, as they may become scenario-specific. 

• Device Customization - An adapter loader deploys the chosen configuration 
throughout the system. Over the low bandwidth networks common to 
distributed embedded systems, real-time process migration is unlikely. 
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Rather, the deployment will transfer small bits of state to prepositioned 
executables or move code while the system is off-line. 

The point in time when automatic reconfiguration is executed must be carefully 
managed. The cost of running a reconfiguration manager to determine the 
appropriate configuration can be significant and the network schedule may not have 
slack for adapters to be loaded. Especially in the case of a tightly scheduled and 
resource-constrained system, there may not be enough resources (CPU or network 
cycles, timing slack, etc) to actually execute a reconfiguration step. Instead, we 
envision automatic reconfiguration employed during extreme duress or down time. 
In the case of a crisis, breaking schedules to run the reconfiguration manager makes 
sense in that the system would be completely broken and have no chance of fulfilling 
its mission otherwise. Running the reconfiguration step may allow the system to 
find a configuration where some useful work can still be accomplished with the 
available resources. More typically, execution will happen when the system is down 
for maintenance, or at a slack time in the schedule. In an elevator, for instance, a 
reconfiguration step may occur during the otherwise idle time when the elevator has 
the doors open for passenger loading. An alternative approach may employ an 
incremental reconfiguration manager that can, in a series of steps, make small 
changes in the system configuration and eventually converge on a high-quality 
configuration. 



4. Reconfiguration as Logisticai Support 

Once a system has a reconfiguration mechanism, it can be exploited to provide major 
logistical benefits: the ability to make replacements with non-exact spares, a reduced 
reliance on legacy spares, and graceful upgrade capability. 

Replacing defective parts with non-exact spares is of great logistical utility. If 
achieved, this would free maintenance personnel from the burden of carrying every 
conceivable spare part. For example, they might just carry more capable, generalized 
spares instead of cost-optimized specific repair parts. These parts may be more 
expensive, but minimizing trips to pick up spares would reduce labor and 
transportation costs, often offsetting increased component costs. In emergencies, 
sub-optimal repair parts might be used to perform temporary partial repairs. While 
the military implications for compact spares inventories and non-exact battlefield 
repairs are obvious, such issues are also important for any system involving mobile 
maintenance personnel or systems with few installed systems served per supply 
depot. 

In addition, a major cost of supporting legacy systems is the need to provide 
legacy spares. In the US, a ten-year spare parts pipeline is mandated for 
automobiles, subjecting vehicle OEMs to interesting factory utilization challenges. 
Vehicle OEMs must weigh the warehousing costs of spare parts with the need to 
keep a factory line in operation to manufacture the parts. This mandate will be 
increasingly challenging as more and more automobile subsystems involve digital 
electronics - entire IC fabrication and packaging processes may need to be kept 
operational far beyond their obsolescence merely to provide spare parts designed a 
decade earlier. 
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An automatic reconfiguration mechanism may ease such logistic nightmares. 
Rather than replacing a part with an exact duplicate, a non-exact spare may be 
employed. The reconfiguration mechanism can then be used to find a different 
configuration that still provides for the same level (or perhaps an enhanced level) of 
functionality. By building updated sensors and actuators capable of several different 
algorithms {i.e. containing several different mobile object adapters), system 
designers will fulfill requirements for legacy spares. Such a situation is analogous to 
providing legacy device drivers for a computing device, and is probably no more 
costly. 

Ultimately, it is important to gracefully reintegrate a repaired component as well 
as to reconfigure in the face of a component failure. As subsystems are repaired or 
replaced, the reconfiguration manager determines configurations that can use the 
added resources to restore functionality. 

In addition, reconfiguration allows access to configurations beyond the original 
product design. If a repair is made with a replacement part having superior 
performance, reintegration of the repair part is not just a repair, but also a system 
upgrade. Beyond that, it is possible that new components (and associated abstract 
functionality blocks and software modules) can be added to perform field upgrades 
using the same approach as that employed for reintegrating repair components. 

In fact, graceful degradation and upgrade via reconfiguration are simply ways of 
moving down or up the lattice of points in the product family architecture. When 
some hardware breaks or is inserted, it is as if a different model in the PFA has been 
realized. The reconfiguration manager is responsible for controlling the motion 
within the PFA lattice - by choosing which is the best collection of features to install 
on the available hardware. 



5. Problems with Reconfiguration 

Reconfiguration is not a panacea. If it were, it would already be in widespread use in 
almost every distributed embedded system. Some of the challenges discussed in this 
section are merely research challenges. Others are fundamental to the types of 
systems being built and will remain formidable barriers for those applications. 

5 . 1 Debugging and Technical Support 

One of the prime reasons not to make use of a reconfiguration framework is the 
desire of designers and developers to maintain strict control over the system, and 
thus simplify debugging and technical support tasks. The existence of a 
reconfiguration mechanism allows for a wide variety of system states, and 
determining proper system operation in each is impossible - and in fact it is nearly 
impossible to do even for a single configuration. 

The debugging problem may be alleviated with adherence to a carefully 
controlled architecture. In the same way that the abstraction of an object-oriented 
system reduces overall complexity and assists with interface compatibility, 
reconfiguration is much easier when the adapters fit well defined and properly 
abstracted logical interfaces. The extent to which architecture may actively support 
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reconfiguration is an interesting research problem being addressed by the RoSES 
project. 

Technical support can also be a challenge. When a user reports an error or 
problem, knowledge of the current configuration is useful. The reconfiguration 
manager must scrupulously log all configuration changes and make configuration 
data available to the problem resolution team. This can be a problem to the extent of 
the frequency of configuration changes. In a system where reconfiguration is only 
executed during maintenance, the configuration data will be easier to maintain. If, 
however, reconfiguration happens often, say whenever a vehicle is started or 
whenever an elevator’s doors are opened, then it is difficult to track exactly what the 
contents of the configuration were during at the time of any particular problem. 

5.2 Certification Challenges 

Many applications, often those in public service or extremely safety-critical, need to 
be approved by a certification authority. In the US, nuclear power plants must pass 
specification, design and implementation verification by the Nuclear Regulatory 
Commission. The Federal Aviation Administration certifies avionic and flight 
control systems, while some security systems are in the purview of the National 
Security Agency. A reconfiguration mechanism may increase the certification costs, 
as the developers now must ensure the certifiers are comfortable with the 
reconfiguration mechanism and the manner in which configurations are chosen and 
deployed. 

Any gains from supporting reconfiguration would come when a later version of 
the product must be certified. If the regulatory agency understands reconfiguration 
and is comfortable with the implementation, then re-certification merely involves 
checking that any changed subsystems conform to the same logical interface. 

In the case of safety-critical systems, the loss of system configuration control due 
to automatic reconfiguration may be deemed too great a risk. In such a case, 
designers can pursue a separation strategy whereby the safety critical functionality is 
partitioned away from all other features. Reconfiguration could then be enabled only 
for non-critical functionality. This is a common strategy, for instance, in vehicles 
where one network is employed for engine control, braking, etc. and another network 
is used for the power windows, door locks, and emission control. 

5.3 Error Detection, Failover and Reconfiguration 

Error detection is a surprisingly difficult task. Many fault-tolerant systems dodge 
this problem by assuming a fail fast, fail stop fault model, wherein the node or 
process is assumed to quickly shut down after a fault occurrence. For such a fault 
model, a simple heartbeat message, or the node's fulfillment of its portion of a 
network schedule signals the reconfiguration manager that all is well. Covering 
more complex fault models will require more enterprising error detection schemes. 
Robust mechanisms are needed to ensure the reconfiguration manager knows of a 
fault, with particular attention paid to cases where only part of a node fails. In such 
scenarios, it is probably less than optimal to shut down the entire node, especially if 
the failure only affected a portion of the sensors or actuators hosted at the node. 




A Product Family Approach to Graceful Degradation 



139 



Note that faults in the reconfiguration manager itself can be handled through 
standard fault tolerance means and are not of significant interest (and, even if the 
manager fails completely, a currently loaded configuration would still be able to 
operate). 

Once a failure is detected, prompt configuration switches are necessary. A 
reconfiguration manager that performs incremental changes might be useful to 
accomplish on-the-fly configuration changes. Additionally, an ability to balance 
configuration quality vs. decision time seems attractive. 

5.4 Multi-vendor Challenges 

When a single team is responsible for developing an entire system, reconfiguration 
can be an elegant technology. However, much like other software, if a system is 
built by integrating components from multiple vendors or organizations, some 
special design and legal challenges emerge. 

Designing for Cross-vendor reconfiguration. At its core, reconfiguration takes 
advantage of some extra resources to install functionality. The extra resources are 
provided by design or by freeing them from lower priority uses. In a multi-vendor 
environment, the extra resources may be taken from one vendor’s unit in order to 
provide extra functionality to a unit from a different vendor. The first vendor may 
object as the cost to provide the resources makes the unit more costly compared to 
any non-reconfigurable competing units. 

Liability. It is not at all clear how the liability for an accident or failure would be 
allocated in a system capable of reconfiguration. Determining the origin of the error 
is complex, as described in Section 6.1. In general, if a module written by vendor A 
were installed on vendor B’s device by a reconfiguration manager provided by 
vendor C, a jury could easily find any of the parties liable in the case of an incident - 
especially in cases of miscommunication between A, B and C. 



6. Related Work 

Reconfiguration mechanisms are frequently introduced in the co-design field, where 
reconfiguration is used to change the programming of a field programmable gate 
array (FPGA) or other integrated circuit [6]. System- wide reconfiguration in the co- 
design field is typically seen as a synthesis problem, not the composition approach 
we take. 

General organizational questions about distributed embedded systems have been 
examined in several different manners. Amorphous computing attempts to apply 
biological processes to create self-organizing and, hopefully, fault-tolerant 
arrangements among sensors and actuators [2]. Such approaches are in an early 
research stage and have yet to address or raise many real world challenges such as 
certification and liability. 

Graceful degradation is the subject of few research papers. [4] is one which 
illustrates the complexities in pre-planning the configuration lattice. [7] describes an 
industrial project to develop a system that gracefully degrades. It also points out 
how difficult this can be for a centralized system - the product included a reasoning 
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engine and detailed models, not only of the subsystems, but also a physics-based 
model of the environment. 

The system vision of federated sensors and actuators is similar to those espoused 
by many middleware technologies such as Jini [3] and CORE A [5]. We expect to 
find such middleware to be very useful for implementing our vision of fine-grained 
mobile object adapters, although both technologies are currently a bit too resource- 
intensive for many embedded system projects. 



7. Conclusion 

Graceful degradation is a very nice middle ground between the expensive fault- 
tolerance of modular redundancy and the low cost of non-robust systems. 
Unfortunately, graceful degradation is difficult to achieve in a systematic manner. 
The system architecture seems to be critical to achieving smooth degradation steps. 
If less useful functionality is bound architecturally to vital functions; by, for instance, 
being part of the same system modules, then it cannot be shed to free resources when 
a fault appears. 

We think that a PFA based on reconfiguration mechanisms provides an 
appropriate framework in which to design and reason about a system’s ability to 
gracefully degrade. As a means to explore our ideas, we have begun the RoSES 
project. RoSES will also address some interesting research, challenge that address 
both technical and business concerns. We expect RoSES to demonstrate a product 
family approach combined with a reconfiguration infrastructure that will provide the 
advantages of graceful degradation, graceful upgrades, and reduced logistical cost 
through the use of non-exact spares. 
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An embedded system is in constant interaction with its environ- 
ment. It can consist of several, possibly distributed, components 
communicating with each other using interfaces. Collective be- 
haviour between the system and its environment may be nonde- 
terministic or random, and can include continuous quantities. The 
effects of the collective behaviour to the architecture of the system 
are non-obvious and should be considered before defining the in- 
terfaces between system components. This calls for methods capa- 
ble of expressing complex collective behaviour and providing 
proper structuring of complex specifications. In this paper we dis- 
cuss such capabilities in conjunction with the DisCo method. 



1. Introduction 

Embedded systems are inherently reactive. They are in constant interaction with 
their environment, i.e., the context in which they operate. These systems can consist 
of several, possibly distributed, components communicating using interfaces. 

Interaction with the environment may include nondeterministic, random or con- 
tinuous behaviour. These sources of complexity are usually hidden behind external 
interfaces. Conventionally, the specification process of an embedded system starts by 
identifying the external interfaces as well as the components of the system together 
with the internal interfaces between the components. The process continues by de- 
fining each component separately. However, it is often the case that as the develop- 
ers gain more insight into the collective behaviour of components, the internal inter- 
faces have to be revised. This often leads to other changes as well. 

To alleviate the problem, the process should be started at a higher level of ab- 
straction and the definition of interfaces should be postponed. The collective behav- 
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iour should be captured before partitioning the system into components and defining 
the interfaces (Kurki-Suonio&Mikkonen, 1998, 1999). This leads naturally to closed 
specifications, which are self-contained in the sense that they describe the behaviour 
of the total system, including the environment, rather than individual components. 

Closed modelling does not mean that specifications cannot be modularised. How- 
ever, modularity in specification languages is unlikely to be the same as in pro- 
gramming languages (Maibaum, 2000). Instead of reflecting the implementation ar- 
chitecture, closed specifications should exploit abstractions of the final system as 
units of modularity (Kurki-Suonio&Mil^onen, 1999). Consequently, different 
sources of complexity in the environment can be isolated into specification modules, 
which potentially affect many implementation-level components. 

The method used in capturing collective behaviour should support expression of 
nondeterministic, random and continuous behaviour. Moreover, it should address 
effectively separation of concerns at high levels of abstraction. Nondeterminism is a 
built-in feature in many specification formalisms, so in this paper, the focus is on 
environment modelling especially concerning continuous and random behaviour in 
closed specification of embedded systems. Furthermore, an example of a distributed 
water tank system is given to illustrate structuring of complex specifications. In the 
sequel we will use DisCo, which is a specification method for real-time reactive 
systems. The rest of this paper is structured as follows. Section 2 discusses specifi- 
cation of embedded systems. In Section 3 the DisCo method is introduced and in 
Section 4 the example is given. Section 5 contains some concluding remarks. 



2. From structural view to collective behaviour 

2. 1. Specifying embedded systems 

Embedded systems have some characteristics which distinguish them from other 
systems. One of these characteristics is that they are inherently reactive. Reactive 
means that they are in constant interaction with their environment, i.e., the context in 
which they operate. Embedded systems may consist of several, possibly distributed 
components. These components communicate with each other using internal inter- 
faces and with the environment of the system using external interfaces. 

Interfaces provide means to implement collective behaviour between the compo- 
nents and between the system and its environment. Interfaces play a very important 
role in decomposing complicated systems. Well-defined and documented interfaces 
enable concurrent work, facilitate maintenance and enable reuse of components. 
Once defined the interfaces should not be changed. However, components and inter- 
faces describe the static structure of the embedded system, not the collective behav- 
iour. 

2.2. Why to consider properties of the environment? 

An embedded system may interact with its environment in complex ways. This inter- 
action may include nondeterminism, randomness, continuous quantities etc. By non- 
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determinism we mean that no statistical properties can be found and by randomness 
that there are some statistical properties which can be described. Different sources of 




Figure 1 . Structural view to an embedded system. 

complexity in the environment may pose non-functional requirements to the archi- 
tecture of the system. These requirements may involve, for example, timeliness, per- 
formance and reliability. The affects of these requirements to hardware/software 
partitioning, for example, are usually non-obvious. 

As an example, a fly-by- wire or an anti-lock breaking system has interaction with 
a user (driver or pilot) and with physical phenomena. Consider a real-time require- 
ment for bounded response requiring that within certain time interval from a user 
activity certain changes in the physical environment occur. Obviously, for the archi- 
tecture of the system, effects of such a requirement are non-trivial. 

2.3. From structural view to collective behaviour 

In Figure 1, a structural view to an embedded system is depicted. This is usually the 
most abstract view to the system consisting of components, and internal and external 
interfaces. Nondeterministic behaviour of a user and random behaviour of clients 
requesting service is hidden behind interfaces. Moreover, continuous behaviour is 
hidden behind interfaces, which provide access to the sensors and actuators. Given 
this kind of view to the system, a designer has to come up with well-defined and 
preferable permanent internal interfaces. However, as the developers gain more in- 
sight into the collective behaviour between the components and between the compo- 
nents and the environment, there might rise a need to revise the interface definitions, 
which probably leads to other changes as well. 

To alleviate the problem, after the requirements phase, the specification of the 
sys-tem should be started at a higher level of abstraction. Definition of interfaces 
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Figure 2. Collective behaviour view. 

between components should be postponed. The collective behaviour should be cap- 
tured before partitioning the system into components with fixed interfaces (Kurki- 
Suonio&Mikko-nen, 1998, 1999). After partitioning, the components can be further 
designed using conventional methods. Those parts of the closed specification which 
belong to the environment need no explicit implementation. 

In Figure 2 the capturing of collective behaviour is depicted. Interfaces are not de- 
fined yet, but components of the system and entities of the environment, e.g. users, 
clients, and physical phenomena, are described in a very abstract way. This view 
does not describe static structure but dynamic co-operation. It may include nonde- 
terministic behaviour with the user, random behaviour with the clients and continu- 
ous behaviour with the physical phenomenon. The focus is on what the collective 
behaviour is rather than how it is implemented (Kurki-SuoniocfeMikkonen, 1998). 
After capturing this view, the definition of internal interfaces is more likely to suc- 
ceed. 

In order to be useful in capturing complex collective behaviour, a specification 
method should allow expression of such behaviour. However, this may lead to speci- 
fications that are hard to understand and reuse. Therefore, the method should support 
separation of concerns in such a way that different sources of complexity in the envi- 
ronment can be specified modularly. Also, reuse of such modules should be possible. 

In closed specifications, the natural units of modularity are not the units of the 
implementation architecture, but logical abstractions of the final system (Kurki- 
Suonio&Mikkonen, 1999). Consequently, different sources of complexity should be 
isolated into reusable specification modules. 



3. The DisCo method 

DisCo (Jarvinen et al., 1990 & DisCo, 1999) is a state-based formal specification 
method for reactive and distributed real-time systems. It is based on the joint action 
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theory (Back&Kurki-Suonio, 1988, 1989) where the focus is on capturing collective 
behaviour at a high level of abstraction. The semantics of DisCo is defined in the 
Temporal Logic of Actions (TLA) (Lamport, 1994). Concurrency is modelled by 
interleaving and specifications have an operational interpretation. 

The basic notions of the DisCo language are classes and multi-object actions. One 
or more objects, which are instances of classes, participate in actions. As an exam- 
ple, definitions of class C and action A are given below: 

class C= {i : integer} 

action A(cl, c2, c3 : C\r : real) : 
r> 4.6 

cl.i* ^ c2.i* ^ c3. i ’ = max(cl. i, c2. i, c3. i), 

where the unprimed names refer to the values of variables in the state before the ac- 
tion execution and primed in the state following the action execution. If the guard of 
the action, r > 4.6 in the above, evaluates to true, the action is said to be enabled, 
i.e., it can be executed. After execution of action A the values of i attributes of the 
participating objects equal the maximum of values before the action was executed. 

DisCo specifications are generic in the sense that an unbounded number of ob- 
jects are assumed. Furthermore, the action to be executed next, its participants and 
parame-ter values are chosen nondeterministically. This gives basis for specifying 
reusable patterns of collective behaviour. Specifications are refined from a high level 
of abstraction towards implementation description using stepwise refinement. The 
refinement mechanism is superposition, which preserves safety properties (“some- 
thing bad never happens”) by construction. One refinement step is described in a 
layer, which is a unit of modularity. A system is always specified together with its 
assumed environment, i.e., specifications are closed. Also, specifications can be 
composed and actions synchronized to be executed in parallel. To illustrate the use of 
superposition, C is extended with attribute b, and new conjuncts are added to the 
guard and body of A \ 



C=C + {b : boolean} 

action A(cl, c2, c3 : C\r : real) : refines A(cl,c2,c3,r) 

3.5 <r 

cl.b’ - c2.b' = c3.b' = true. 

Logical abstractions can be given as different logical layers (Mikkonen, 1999) af- 
fecting potentially many implementation-level components. Because DisCo specifi- 
cations are operational, their finite instances can be validated using simulation. The 
method includes tool support for graphically animated simulation of specifications 
(Systa, 1991). Furthermore, the formal basis enables verification using theorem 
proving (Kellomaki, 1997) and model checking (Aaltonen et al., 2000). 



3.1. Modelling time 
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Real time is usually considered to be the most important continuous quantity at least 
concerning embedded systems. It can be incorporated in the above scheme as follows 
(for more detailed discussion, see (Kurki-Suonio&Katara, 1999)). It is assumed that 
actions are executed instantaneously. A clock variable £2 belonging to the set of 
nonnegative reals and initialised as 0, records time from the beginning of a behav- 
iour. In each action, the time when it is executed is given by the value of an implicit 
parameter r. Furthermore, guards of all actions are implicitly strengthened by the 
conjunct Q < t < min(A), where A denotes a multiset of deadlines. Additionally, 
conjunct £2' = ris added to bodies of all actions. 

Both minimal separation and bounded response requirements can be expresses 
using these constructs. Minimal separation between executions of actions A and B 
can be required by strengthening the guard of action B by conjunct t>Ta-^ d where 
Ta denotes the most recent execution moment of A. Deadlines are needed for 
bounded response requirements. When a deadline r + is required a conjunct of the 
form X * = Aon(d) is given in the action body. It adds the deadline to A and stores it in 
a variable x. Until some action removes the deadline with Ao^x), an implicit conjunct 
T<mm(A) in all guards prevents advancing i2 beyond this deadline. Initially, A can 
hold initial deadlines. A type time, a synonym type of real, can be used in timed 
specifications. 

3.2. Modelling other continuous quantities 

In TLA values of state variables can change only in actions. However, we can think 
of storing samples of a continuous quantity in a state variable. Periodic sampling can 
be modelled as a periodic action with non-deterministic parameter value representing 
the value of the quantity at a moment when the action is executed. The samples and 
time can be used to give an approximation of the quantity (Kurki-Suonio, 1993). 

Modelling an action that has to be executed at a moment when the quantity 
reaches some limit poses a problem in determining the right moment of time when 
the execution should happen. Because the moment the quantity reaches the limit is 
not known beforehand, a time deadline for the action cannot be given. 

Time deadlines can be generalized to other continuous quantities as well. It is as- 
sumed that between executions of actions each continuous quantity of interest 
changes monotonically. This means that between executions of actions their values 
are either non-increasing or non-decreasing. For each continuous quantity q of inter- 
est, multisets A^' and A^^ are introduced to hold lower and upper hybrid deadlines 
concerning the value of the quantity. Additionally, all guards are strengthened by 
conjunct 

max(A^~) ^ Pg ^ min{A^^), 

where pg is the parameter corresponding to the quantity. In (Katara, 2000) these con- 
structs were used to model physical mobility. 

3.3. Modelling stochastic behaviour 

In order to capture randomness occurring in the environment, it should be possible to 
address statistical properties at the specification level. One way to introduce ran- 
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domness is to let action parameters exhibit some mean, variance and distribution. As 
an example consider a generic Poisson process modelled as follows: 

class Poisson = {A : constant real, 
t : time}, 

where X is the rate of the process and t is used to store a deadline for the next Poisson 
arrival. Poisson arrival at rate X is modelled as an action: 

action Poisson_Arrival(p : Poisson', d : real(exp,llp.A)): 

T>p.t 

P*’= ^on(d)- 

The values of parameter d are exponentially distributed with mean 1/p.A., where p is 
a participant belonging to the class Poisson. The value of the parameter is used to 
determine the next moment the action is executed. 

However, TLA does not support expression of statistical properties. This means 
that stochastic parameter values lack formal semantics. Nevertheless, statistical 
analysis can be employed. Also, stochastic parameter values could be used in simu- 
lation. 



4. Example of separating concerns 

As an example of separating concerns at a high level of abstraction a simple example 
of an embedded system with a complex environment is presented. The system con- 
sists of distributed water tanks and customers, which visit tanks and consume water. 
As customers consume water from a tank, the tank’s water level decreases and it can 
order more water from a truck. However, there is a considerable delay before the 
truck arrives and filling may be started. Furthermore, tanks may leak. The objective 
is to develop a system which guarantees that the tanks never run out of water. Be- 
cause of lack of space, only a brief summary of the specification is given. For the full 
specification, the reader is referred to (DisCo, 1999). 

In Figure 3 the structure of the specification is depicted. The specification starts 
from a simple layer modelling only the functional aspects of the collective behav- 
iour. Layer Functional Specification includes the definition of classes tank, customer 
and truck, associated relations, and some initial conditions which have to hold in the 
initial state. There are also five actions: Start Service and Stop Service, which 
model arrival and departure of customers. Order, which models ordering of a water 
delivery, and StartJFilling and Stop_Filling modelling filling of a tank. 

There are also three generic layers. Layer Aperiodic Events uses constructs intro- 
duced in Section 3.1. to define the generic class and actions to model scheduling an 
aperiodic event {A_Schedule) and triggering it (also Kurki-Suonio&Katara, 1999). 
The time interval between scheduling and triggering is given as a parameter d in ac- 
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tion AJSchedule. Actions A_Trigger< and A_Trigger= model triggering. The former 
is used when the action corresponding to the event can be executed before d has 
passed and the latter when exactly time d must have passed from scheduling. 

As described in Section 3.2, time deadlines can be generalized to other continuous 
quantities as well. Layer Continuous Behaviour describes generic continuous quan- 
tity 




Figure 3. Layers of the distributed water tank example. 

including scheduling and triggering actions for both lower {Schedule and Trigger) 
and upper {Schedule^ and Trigger^) hybrid deadlines concerning the values of the 
quantity. Furthermore, layer Poisson Process contains the generic class and action 
introduced in Section 3.3. 

The total specification is obtained by composing the functional layer with the 
three generic layers and applying superposition to the composition. Instantiation of 
the generic parts is done in the final layer. Moreover, it is required that in the initial 
state there is a lower hybrid deadline corresponding to a notice level where more 
water should be ordered. Action StartJService is synchronized with Poisson_Arrival 
and AJSchedule, and action Stop_Service with AJTrigger^. Using stochastic pa- 
rameter values, mean and variance for the service time are given. 

Moreover, action Order is synchronized with Trigger and AJSchedule, modelling 
reaching the notice level and scheduling a water delivery. StarjFilling is synchro- 
nized with A_Trigger<md Schedule^, to model arrival of the water delivery and set- 
ting an upper hybrid deadline that equals the capacity of the tank. Furthermore, 
Stop_Filling is synchronized with Trigger^ and Schedule', conforming to the water 
level reaching the capacity of the tank and setting a lower hybrid deadline for the 
notice level again. 

To find a correct value for the notice level, we cannot solely resort to the worst- 
case scenario. An unbounded number of customers could arrive at the moment action 
Order has been triggered and consume all the water at once. To illustrate how statis- 
tical analysis can be used in this context, we have modelled water consumption using 
a compound Poisson process (see, for example Ross, 1997). Water consumption by 
time t is given by the formula 

N(t) 

= ,(> 0 , 

/=1 
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where N(t) is the number of customers who arrive by time t and Yt is the amount of 
water the /th customer takes. We assume that (N(t), is a Poisson process and {Yu 
i>l} a sequence of independent and identically distributed (IID) variables, independ- 
ent also of the process (N(t)}. 

The compound Poisson variable X(t) has the mean hE[Yj] and variance 
At(Var(Yj)-^[E(Yj)f) where X is the rate of the Poisson process (N(t)}, and E(Yj) and 
VarfYj) are the mean and variance of Yj. The mean and variance of the water con- 
sumption of one customer can be easily obtained from the mean and variance of the 
time they receive service, if we assume that the rate of water flow from the tank to 
the customer is constant. 

Using Chebyshev’s inequality it can be estimated that the probability of water 
running out before the water delivery arrives is less than p*(x) = Var(X(t))/[2(x~ 
E(X(t))f], where x is the notice level of the tank and t is the maximum delay before 
refilling the tank. The number of fillings of the tank during a longer time period T is 
approximately k(x)=E(X(T))/[V-(x-E(X(t)))] , where V is the capacity of the tank. 
Thus, the probability that the water does not run out in time T is approximately 

p=(i- p*(x)y^"\ 

It can be seen that the probability /? is a function of the notice level x. If p is given, 
the suitable notice level x can be solved numerically. 

Suppose that the capacity of the tank is 10 000 litres, the maximum delay of water 
delivery is 10 hours and we want the probability of water not running out during one 
year to be 0.9. If two customers come per hour on the average and the amount of 
water one customer takes in litres has mean 20 and variance 20, it can be numerically 
solved that the notice level should be at 1665 litres. 



5. Discussion 

Collective behaviour between a system and its environment affects the architecture 
of the system. Therefore, it should be captured before defining the interfaces be- 
tween components of the system. The interaction with the environment may include 
complex behaviour. However, many specification methods lack support for express- 
ing properties of the environment such as stochastic or continuous behaviour. In this 
paper we have investigated how to incorporate stochastic and continuous modelling 
into the DisCo method, which uses logical layering to modularise specifications. 

We have presented some ways to model complex environment behaviour. How- 
ever, as the underlying temporal logic does not provide direct means to express sta- 
tistical properties, advantages of formal specification are easily lost. Yet, this does 
not rule out the use of classical statistical analysis and simulation as ways to validate 
system behaviour. Furthermore, the example given shows some evidence that logical 
layering provides useful separation of concerns also in the presence of complex be- 
haviour. 

There is a lot of ongoing work in the area of hybrid stochastic systems. One future 
direction would be trying to find a mapping between DisCo and some formalism 
capable of expressing hybrid stochastic behaviour but not necessarily focusing on 
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collective behaviour. Moreover, there is a lot to be done to develop user-friendly 
tools to support modelling of continuous and stochastic behaviour. 
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TEST CASE DESIGN FOR THE VALIDATION OF 
COMPONENT-BASED EMBEDDED SYSTEMS 

Wolfgang Fleisch 

http://www.ias.uni-stuttgart.de 



The validation of functional and real-time requirements of control 
software for embedded systems is a difficult task. It usually needs 
the electronic control unit (ECU) and the controlled hardware 
components. But very often the ECU or hardware components are 
not available for testing the control software at the beginning of 
the development. This paper presents how test cases can be 
designed from use cases and how component-based control 
software can be validated without ECU and hardware components 
by simulating the test cases in early development phases. For 
achieving a tool-based testable format, extended UML sequence 
diagrams are applied to formalise sequences of events, which 
have been specified in the use case scenarios. Provided that black 
box components are used for developing component-based 
applications, the monitoring of the dynamic behaviour inside the 
components is not possible during simulation. But the simulated 
dynamic behaviour is observable on the connections between the 
software components. In such a way monitored and recorded time 
stamp events are finally compared offline against the expected 
sequences of events specified in the test cases. The offline 
comparison validates the simulated behaviour by demonstrating 
the fulfilment of user requirements and by detecting errors in case 
of contradictions. An application example of an automotive wiper 
control system demonstrates the capacity of the presented test 
case design and validation process. 



1. Introduction 

1.1. Component-Based Embedded Systems 

Embedded systems are control systems which are embedded in a technical system. 
They are designed for calculating actions as a response to characteristic input values. 
Usually this task is performed by a microcontroller based electronic control unit 
(ECU) which communicates with its environment by sensors and actuators. 
Distributed embedded systems consist of a network of ECUs, which exchange 
information via a communication network. A typical example for a distributed 
embedded system is a modern car. Nearly all kinds of control and supervision 
functionality, i.e. anti-blocking brake system, transmission control, fuel injection and 
different body electronics features, are controlled by distributed ECUs. It can be 
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presumed that almost the complete ECU hardware is developed by using components 
off the shelf. In opposite to this fact it is a relatively new approach to design also the 
control software with predefined software components [Goeh98]. 

1.2. Component-Based Control Software 

Development of component-based control software means that new application 
software for an embedded system is composed of a set of configurable (with 
parameters) software components, which have been explicitly developed for multiple 
usage [Goeh98]. In the following, the term component always denotes a software 
component. Applying components and a component model promises many 
advantages as for example reduced development time and increased quality of the 
composed control software. The higher quality is reached by using prefabricated 
components, which have been extensively tested during their development. Applying 
uniform communication and execution mechanisms, provided by a component 
model, also reduces the number of possible design errors. The components exchange 
information by exchanging events and data on the connections between their 
interfaces. 
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Figure 1 : Example for modelling a component-based control software with ASCET-SD 

Figure 1 presents an example of a component-based control software, which has 
been developed with the CASE tool ASCET-SD (Advanced Simulation and Control 
Engineering Tool - Software Development) from ETAS GmbH [ETASOO]. ASCET- 
SD supports modelling, simulation and target code generation of component-based 
automotive control software and has been applied in the context of this research 
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work. ASCET-SD offers different graphic and textual editors for modelling 
components and applications. The internal behaviour of components can be specified 
in different notations, i.e. control block diagrams, state machine diagrams, a Java- 
based specification language or standard C-code. 

1.3. Difficulties in the Validation 

Control software for embedded systems is characterised by real-time requirements, 
distribution and increasing complexity. The validation of functional and real-time 
requirements of a developed component-based control software is a difficult task 
[FlRi97], [GuNae99]. The testing of control software for example needs the ECU, 
the controlled hardware components and special debugging devices for real-time 
data monitoring. But very often the hardware components, i.e. the car or a part of it, 
and the ECUs are not available at the beginning of the software development, which 
leads to a late validation of the developed control software. The late validation 
results in a delayed detection of design errors in the control software, which causes 
increased fixing costs and delayed project schedules. For that reason simulation has 
been selected in this research work to enable the early validation of component- 
based control software during the early development phases [Flei99], [FleiOO]. 

The quality of prefabricated components is verified by extensive testing during 
their development. But when composing a new control software with such 
components the level of correctness can not automatically be considered to be equal 
to the original single components. When applying such black box components for 
composing a new control software, the monitoring of the dynamic behaviour during 
simulation inside the components is supposed to be impossible. But the dynamic 
behaviour of the component-based control software is observable on the connections 
between the connected components. For these purpose, especially the exchanged 
events between the connected components are monitored in the simulation model 
when executing the simulation of the component-based control software. 

2. Early Validation of Component-Based Controi Software 

The validation process for early development phases, presented in Figure 2, 
addresses two major goals: 

• Demonstration of the fulfilment of user requirements 

• Detection of design errors in the component-based control software 

Both are checked by an offline comparison of the simulated dynamic behaviour of a 
simulation model against the specified required behaviour. It has to be noticed that 
design errors in component-based control software are mainly reduced to 
composition and configuration errors when using already verified components. 

The component-based development and validation process consists mainly of the 
following 7 steps: 

1. Use case analysis for gathering testable requirements from an users point of 
view 

2. Transformation of use case scenarios into extended UML sequence diagrams 

3. Design and modelling of the component-based control software and 
generation of an executable simulation model 

4. Definition of stimuli sequences to complete the test cases 
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5. Simulation of the test cases for recording the communicated events between 
the components in the simulation model 

6. Offline comparison between the ‘Simulated Event Sequences’ (SES) and the 
‘Required Event Sequences’ (RES) for validation and error detection 

7. Correction of the component-based control software or the use cases if a 
contradiction (possible error) has been detected 

The test case design (see steps 1,2 and 4 in Figure 2) is part of the development and 
validation process and explained in more detail in the following sections. 

_ Case Design 




3. Test Case Design 



3. 1. Test Case 

In the context of this work, a single test case consists of a stimuli sequence and a 
expected output sequence of events, which is denoted as ‘Required Event Sequence 
(RES)’. The stimuli sequence defines a sequence of input events which trigger the 
simulation model. During simulation the monitored sequence of events is recorded in 
a ‘Simulated Event Sequence’ (SES). A test case is validated if the SES is conform 
with the RES. A test case detects a possible error if a contradiction between SES and 
RES has been detected. In the following it is described how test cases are designed 
from use cases. 

3.2. Gathering Testable Requirements with Use Cases (Step 1) 

According to Jacobson, the originator of use case analysis, “A use case is a specific 
way of using the system by performing some part of the functionality. Each use case 
constitutes a complete course of events initiated by an actor and the system. ... The 
collected use cases specify all the existing ways of using the system” [Jaco92]. The 
main concepts of use case modelling are actors and use cases. An actor represents an 
entity (a human being, a machine or a computer system) external to the system under 
development, that communicates with the system in order to achieve certain goals. A 
use case is a generalisation of a usage situation where one or many actors interact 
with the system to accomplish specific goals. One use case may cover several 
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sequences of events - so called scenarios. A use case may be described either from 
an external (black-box) point of view or from an internal (white-box) point of view. 

This scenario driven modelling view of the use cases has been the main reason for 
selecting them for describing testable requirements for the dynamic behaviour of a 
component-based control software. Actors on the requirements level are not equal to 
components on the design level, but the sequences of events (scenarios) between the 
actors can be compared against sequences of events between the components in the 
simulation model, which is generated from the component-based design model. Also 
an important advantage of use cases in general is their suitability for the software 
design as well as for the validation. The requirements do not come from the blue but 
are systematically gathered by writing use cases. Use cases are not qualified to 
describe requirements completely, but when testing a designed control software 
against all specified use cases by simulation later on, it can be stated at least that the 
validation is complete from an user’s point of view. 

Cockburn [Cock97] has introduced a helpful template for writing use cases. 
Based on this template we have developed similar use case template, which is 
presented in Figure 3 [Flei99]. Additionally we have defined some writing rules 
guiding the developer to write precise use case scenarios [FleiOO]. The structured 
format of a table and the writing rules improve the specification of precise and 
meaningful requirements. 



USE CASE <No> 


<name> 


Goal 


<> 


Preconditions 


<> 


Postconditions 


<> 


Actors 


<> ' 


Main scenario 


Step 


Action 1 




1.1 


<> 




1.2 


<> 




1.3 


<> 








Scenario 2 


step 


Action 




2.1 


<> 






<> 


Variations 


<> 


Esfceplions 


<> 


Notes 


<> 



Figure 3: Use case template 

3.3. Transformation to Required Event Sequences (Step 2) 

Although the use case tables are already in a structured format, they are not suitable 
for a tool-based automatic testing. For achieving a testable format of the 
requirements, all use case scenarios are transferred manually into extended UML 
sequence diagrams, denoted as ‘Required Event Sequences’ (RES) as presented in 
Figure 4. Use cases which have been written according to the writing rules, relieve 
this manual transformation process. The graphical specification of the RES is 
supported by a self-developed sequence diagram editor. The RES are based on the 
UML sequence diagram notation [UMLOO] and have been extended for specifying 
additional real-time conditions. Timing conditions between two events can be 
specified relatively as maximum, minimum or interval time. 
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After specifying the required order and timing of a sequence of events, three 
additional test case attributes are set for every RES. The attribute precondition is 
taken over from the use case scenario. The attributes sequence type (normal, 
mandatory or forbidden) and comparison type (event repetition allowed or not) are 
set according to the selected test strategy for the offline comparison. If sequence type 
normal is selected, the RES is only compared against the respective SES. If sequence 
type mandatory or forbidden is selected, the RES is compared against all SES if the 
sequence of events occurs after the precondition has become true. If a forbidden RES 
occurs after the precondition has become true, it is an error. Depending on design 
decisions from the component-based development, components may send the same 
event periodically. Therefore the comparison type defines if a required event in the 
RES is allowed to occur repeatedly in the recorded SES or not. The attribute 
comparison type has been introduced after first practical experiences with the offline 
comparison algorithms. 



Precondition 



Sequence type 



Comparison type 



Required Event Sequence (RES) 

Actor Actor Actor Actor 

(>il) («) (Ai) (An) 

Event 1 , I r 

Event 2 ^[Tt 



Event 4 



Event m 



Event 3 






Figure 4: ‘Required Event Sequence’ (RES) 

3.4. Deriving Stimuli Sequences (Step 4) 

After the component-based design model has been developed and an executable 
simulation model has been generated from it, the test cases can be completed. The 
expected output sequence of events is already defined in a RES. A stimulating input 
sequence of events has to be derived from the respective RES to complete a test case. 
The first event of the RES usually becomes the first stimuli event. Alternatively 
other events which trigger this first event can become stimuli events. Additionally 
all events in the RES, which are sent from an external actor (i.e. switch or sensor) are 
also candidates to become a stimuli event in the stimuli sequence. 

4. Test Case Simulation 

4. 1. Preparation and Execution of the Simulation Model (Step 5) 

The generated simulation model of the component-based control software has to be 
prepared for the test case simulation. The ‘Test Driver’ component as shown in 
Figure 5 includes the necessary supplements. 

The missing hardware components usually have a reactive behaviour, as for 
example an electrical wiper motor. For that reason a simple environment model of 
the hardware components is supplemented to the simulation model to simulate the 
closed loop control behaviour of the overall system. The environment model is one 
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part of the Test Driver’. The other part of the Test Driver’ contains the stimuli 
sequences, which inject the stimulating events for the simulation runs of different 
test cases. 



Event Monitoring 




Environment 

Model 



CAN 



Figure 5: Simulation of test cases involving a single ECU 

For the simulation of test cases involving single ECUs, we have selected the tool 
ASCET-SD for component-based modelling and real-time simulation [ETASOO]. 
ASCET-SD provides component-based modelling of control software as well as a 
comfortable simulation environment for injecting stimuli and for recording real-time 
data during simulation runs. Time stamp events can be reconstructed offline from the 
recorded real-time data of the different simulation runs. 

4.2. Offline Comparison and Corrections (Step 6 + 7) 

The offline comparison tests the conformance between the SES and the 
corresponding RES for every simulated test case. Conformance is achieved if all 
events in the RES can be found in the respective SES in the right order and if all 
timing constraints are met. Additionally, each RES of the sequence type mandatory 
or forbidden is tested against all SES. A mapping table for the event names between 
the use case model and the simulation model is necessary to enable a tool-based 
automatic checking. If the offline comparison detects contradictions, the possible 
errors in the component-based design model or in the specified use cases have to be 
investigated by the developer and corrections have to be made. Typical design errors 
are missing connections between components, wrong parameter values of 
components or improper dynamic behaviour on the system level. 

5. Application Example: Test Case Design and Validation of 
an Automotive Wiper Control System 

The following application example is part of a distributed body electronics system 
and demonstrates the test case design as well as the capacity of the validation process 
when modelling and simulating component-based control software using ASCET- 
SD. The application example is an automotive wiper control system. The 
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component-based software (see Figure 1) controls the hardware components rain 
sensor (for rain strength measurement), steering column stalk (for driver’s wishes), 
wiper motor and dashboard display. A photo of the laboratory prototype is presented 
in Figure 6. 




During the early development 
phases most of the hardware 
components have not been 
available. For that reason the 
component-based control 

software has been validated by 
real-time simulation and offline 
comparison. Later on, when the 
real hardware components have 
been available except the ECU, a 
hardware in the loop simulation 
of the component-based control 
software has been executed. 4 use 
cases with 16 scenarios have been 
gathered as initial requirements in 
the use case analysis as a starting 
point for the test case design. 
Figure 7 shows exemplary the 
main scenario of the use case 
‘Standard wiping front window’. 



Figure 6: Automotive wiper control system 



USE CASE W.1 


Standard wiping front window 


Goal 

PrecoFKlition 

Postcondition 

Actors 


clean windscreen 

a) wiper motor(A3) off (wiper motor park position sensor(A2) in state ,on') 

1 b) wiper motor(A3) on (wiper motor park position sensor{A2) in state ,off') 

steering column stalk(A1) in position ,wiper_offS wiper motor park position sensor{A2) is ,on‘ 
driver(AO), steering column stalk(A1 ), wiper motor park position sensor(A2), wiper motor(A3), 
wiper control fA4) 


Main Scenario 


i Step Action 




111 driver(AO) switches steering column stalk(AQ) in position ,wiper_sJow' with 

precondition a), 

1 ,2 steering oolumr> stalk(A0) sends its position wiper^slow" to wiper control (A4). 




1.3 wiper control [A4) starts wiper molor(A3) with velocity .slow* after maximum 

delay oi 100 ms. 




1 .4 drive r( AO) switches steering column stalk(A1 ) in position ,wiper„off', 

1.5 steering column stalk(Al) sends its position ,wiper_off' to wiper controJ(A4). 

1.6 wiper motor park position sensor(A2) sends ,on^ to wiper control (A4). 

1 .7 wiper control (A4) stops the wiper motor(A3) in park position with velocity .off 
after maximum delay of 20 ms. 



Figure 7: Main scenario of the use case ‘Standard wiping front window’ 

For achieving a testable format, the main scenario of the use case ‘Standard wiping 
front window’ is transformed to the RES in Figure 8. The test case attributes of the 
RES are set as following: 

Precondition: Wiper motor ‘off 

Sequence_type: Normal 

Comparison_type: Event repetition allowed 
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Figure 8: RES - Use case ‘Standard wiping front window’, main scenario 



For completing the test case belonging to the RES, the stimuli sequence is derived 
from the RES after the component-based control software has been modelled in 
ASCET-SD. Especially the events, which are sent from the actor driver, could not be 
mapped to a software component. These events come from external hardware 
components and become a stimuli event in the stimuli sequence. The events 
rl.wiper_slow and r4.wiper_off, which can be seen in Figure 8, have become stimuli 
events because the come from the external actor driver. All stimuli events are sent by 
the ‘TestDriver’ component during simulation execution. The ‘TestDriver’ 
component also encapsulates the environment model of the missing hardware 
components. The injected event s6.current_error for example, which can be seen in 
Figure 9, comes from the model of the wiper motor, which simulates the dynamic 
behaviour of it. 

Figure 9 shows one SES from the simulation results. It has been stimulated by the 
exemplary test case. The test case has been simulated under the assumption that a 
mechanical blocking of the wiper motor during the standard wiping has caused a 
current error, which has been detected by the hardware diagnosis component of the 
control software sending the event s6.current_error. 




Figure 9: SES - Use case ‘Standard wiping front window’, main scenario 



The offline comparison of the RES against the corresponding SES has delivered the 
result that the SES is incomplete. The first five events, which are highlighted in Figure 
10, have been found, but the last two required events have not been found. The 
stimulated mechanical blocking of the wiper motor causes a deadlock situation 



160 



Architecture and Design of Distributed Embedded Systems 



between the component WiperCoordinator’ and the component ‘WlperControl’ in the 
component-based control software. A simple reconfiguration of the component 
‘WiperCoordinator’ has solved the problem. The example demonstrates how errors in 
the component-based control software are detected by test case simulation. 

6. Conclusions 

Component-based development of control software for embedded systems promises 
overall improved quality and reduced time to market. But it can not be concluded 
from the higher quality of single components to the correctness of a composed 
component-based control software. Although design errors are limited to the 
composition and configuration of components and the overall number of design 
errors will be less, the validation of a component-based control software is still 
necessary. Simulation has been selected to achieve an earlier validation of 
component-based control software and the early detection of design errors. The use 
case-based systematic gathering of required sequences of events from an users point 
of view is suitable for the development as well as for the test case design. The 
transformation of the use case scenarios to extended UML sequence diagrams 
enables an automatic tool-based offline comparison of the simulated software. 

Using a professional case tool like ASCET-SD is a big benefit for modelling and 
simulation of component-based control software. ASCET-SD enables the simulation 
of component-based control software for single ECUs respecting real-time 
behaviour. Additionally it supports real-time data monitoring and recording of time 
stamp events, which is the necessary input for the offline comparison against the 
required sequences of events for the validation of the component-based control 
software in early development phases. 

The effort for the test case design and simulation is profitable because error 
detection and correction in later ECU testing would cost at minimum the same 
amount of time. Additional benefit is gained by reusing the test cases for later ECU 
integration tests. 
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TIMING CONSTRAINTS VALIDATION 

USING UPPAAL 
Schedulability Analysis 

Hongyan Sun 



This paper presents an approach that formally models real-time 
tasks and scheduling strategies in terms of timed-automata, and 
then formalizes timing constraints of tasks into reachability 
properties, which, thus, can be validated by using model checking 
tool Uppaal This approach is detailed through two pre-emptive 
priority-driven scheduling strategies: rate monotonic priority 
assignment and priority ceiling protocol 



1. Introduction 

Real-time systems often have to satisfy hard real-time constraints, i.e. such a system 
consists of tasks have to be completed within strict time constraints. These time 
constrains are also called deadlines. If the deadlines are not satisfied, it can cause 
catastrophic consequences (Burns and Wellings, 1995). 

Scheduling tasks so that real-time constraints are met is an important and active 
issue in real-time systems, and many scheduling algorithms and schedulability 
analyses have been established. However, the underlying computation model is 
usually restricted to a simple cyclic task model. Since most schedulability analyses 
identify only sufficient conditions, a given set of tasks may be schedulable even 
through they do not satisfy any of the known schedulability conditions (Bums, 
1991). Schedulability analysis is an extremely hard problem, even when the 
execution times of all tasks are precisely known (Balarin et al, 1998). In addition, 
most schedulability analysis methods are separated from the system (e.g. task, 
scheduler, and etc.) design, so that the schedulability analysis results may not 
reliable. This is much likely the case when more communications are involved in the 
system, e.g. in a distributed architecture (Kopetz, 1997), which the embedded 
systems often requires. 
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In this paper, the schedulability analysis is handled in a way that allows also 
validating the other system properties, e.g. correctness, while validating timing 
constraints, at the early stage of the system development. It formally models tasks 
and scheduling strategies using timed-automata (Alur and Dill, 1991), formalizes the 
timing constraints of tasks as reachability properties, and then uses model checking 
tool Uppaal (Larsen et al, 1997) to verify the properties. This approach is discussed 
and illustrated through two pre-emptive priority-driven scheduling strategies: rate 
monotonic priority assignment (Liu and Layland, 1973) and priority ceiling protocol 
(Liu et al, 1990). 

Model checking tool Uppaal has been successfully applied to several time-critical 
systems, e.g. (Iversen et al, 2000) and (Hune et al 2000). It provides: a description 
language to describe or model system behaviors as networks of timed-automata; a 
model checker to check reachability properties by exploring the state-space; and a 
simulator to visualize the execution traces of a system. 

The remainder of the paper is organized as follows: In the next section, it focuses 
on the rate monotonic priority assignment scheduling strategy and discusses how to 
model a task and a scheduler in terms of timed-automata, it shows how the timing 
constraints are formalized as a reachability property such that it can be verified using 
Uppaal. Section 3 focuses on the same issues but based on the priority ceiling 
protocol Several sets of tasks are tested and the test results are respectively shown in 
both sections. The last section, section 4, presents some concluding remarks. 
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2. Rate Monotonic Priority Assignment 

The rate monotonic priority assignment (or rate monotonic for short) strategy 
assigns a task priority according to the period of the task, such that the shorter the 
period, the higher the priority. It assumes that: 

• Each task is periodic 

• Each task has a deadline equal to their period 

• Each task is independent 

• Each task has a fixed worst-case computation time 
2.1 Task Model 

Figure 1 shows a task model in terms of timed-automata (supported by Uppaal), 
under the assumptions given above. Where, a task is modelled, once it initiates from 
the initial location Init, as an infinite loop that can be in any one of the locations: 
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Suspended, Ready, and Running. The Suspended location corresponds to a task 
that has not released yet. A task is in Ready when it can execute but its priority may 
be less than the current running task. A task is in Running when it has control of the 
CPU. 



inii Su^ended 




Figure 1. Task model 

In Figure 1, T is the period of a task, D the deadline of a task, and C the worst- 
case computation time, jc, y and z are clocks. Integer variable t is used as a counter to 
count how many CPU-time units have been used for a task, and it is introduced 
because of the syntax of Uppaal. For the same reason, the channel count and the 
integer variable k are introduced. A task is identified by its priority id. The priority is 
assigned, in this case, according to rate monotonic priority assignment. 

Each task is an instantiation of the task model giving the values to a set of 
parameters {id, T, C, D). 

A task releases itself by raising the flag ready. It synchronizes with the scheduler 
through communication channels preempt, run and suspend. The integer variable cur 
denotes the current running task, and the integer variable enable shows whether the 
current highest priority task is valid or not. 

The fail channel is used to synchronize with a failure Observer, which is used for 
the verification purpose. 

2.2. Scheduler 

The rate monotonic scheduling is of the pre-emptive priority-driven scheduling. The 
scheduler is then modelled as show in Figure 2. 

The model contains three locations. Idle, Select, and Run. Initially, when no task 
is ready, the scheduler is in the location Idle. It transits to the Select location once 
the highest priority task of the currently ready tasks is valid. It then selects that task 
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to run, i.e. it is in the location Run after the synchronization with the selected task 
over channel run. 



me 



enable-- 1, max > 0 cur:- max go? 

nuuiinig[cur] 0, 
cux;- 0 suspend? 



iiinnin^tcur] 1, enable 



0 run! 



enable-=^l, 
max> CUT 

preen^i! 
runniiMstcur] 0, 

CUT 0 



Select 



Run 



Figure 2. Scheduler model 

In the location Run, if there is another higher priority task ready {max > cur), the 
scheduler will synchronize with the currently running task, which is then pre-empted, 
over channel preempt, and return to the Idle location in order to make a new 
selection. If the currently running task completes its computation, the scheduler will 
notice that through channel suspend and then returns to the Idle location. 

The flag running indicates whether a task is running or not. The integer variable 
max denotes the highest priority task that is ready. 



2.3. Timing Constraints Validation 

In order to validate timing constraints, an Observer process is employed and it is 
modelled as show in Figure 3. 




Figure 3. Observer model 

The Observer synchronizes with the currently running or ready task over the fail 
channel and transits to the Fail location, once the deadline of a task is missed. 

Thus, the timing constraints property becomes the reachability property. 
Formally, this reachability property p is formalized in terms of Uppaal verification 
notions as: 

p\ A[] (not Observer. Fail) 

It says that it is always the case that the Observer is not in the Fail location. This 
property can then be verified using Uppaal. 
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2.4. Tests 

Three sets of tasks, shown in Table 1 to 3, are taken from (Burns and Wellings, 
1995) as a comparison with the classical method. These three sets of tasks are 
verified against the property p using the model checker of Uppaal. The dynamic 
behaviours of the system are examined using the simulator of Uppaal. 

In the tables, id denotes the priority of a task, T denotes the period of a task, C the 
computation time and D the deadline. 



Table 1 . The first set of tasks 





Id 


T 


C 


D 


PI 


3 






30 


P2 


2 


40 


10 




P3 


1 




12 


50 



Table 2. The second set of tasks 




The results show that for the first set of tasks, the property p is not satisfied, i.e. 
a deadline is missed. For the last two sets of tasks, the property p is satisfied, i.e. no 
deadline is missed. 



3. Priority Ceiiing Protocoi 

In the rate monotonic scheduling strategy, it is assumed that tasks are independent. 
When tasks require accessing data or synchronizing via protected shared resources, 
the potential for blocking is introduced, i.e. a task can be prevented from accessing a 
resource by another task that has already locked the same resource. Thus, it will 
introduce the uncontrolled priority inversion problem, i.e. the higher priority task is 
blocked by the lower priority task, which is not desirable. 
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Priority ceiling protocol is developed to make this problem more controllable, i.e. 
to reduce the worst-case task blocking time to at most the duration of execution of a 
single critical section of a lower priority task. 

The basic idea of this protocol is to assign a priority ceiling to each semaphore 
that protects a critical section. The priority ceiling of each semaphore is equal to the 
highest priority task that may use this semaphore. A task can access a critical section 
only if its priority is higher than all priority ceilings of all the semaphores locked by 
other tasks, otherwise it is blocked and the blocking task will inherit its priority. 

This scheduling strategy is modelled using four primary models, task model, 
priority control model, scheduler model, and semaphore model, which will be 
discussed in the following subsections. 



3.1 Task Model 

The task model is similar to the one in Figure 1, but it allows the different initiation 
time and dependent tasks, i.e. the task can access its critical sections when execution. 
Figure 4 shows only a part of the model, and the complete model is given at 
http://www.it.dtu.dk/-hs/DIPES. 



Inii 




Figure 4. A part of the task model 

When a task is in the location Running, it may try to access its critical section by 
synchronizing over channel try with the priority control process, once the instant of 
accessing the critical section is reached. If it has not high enough priority, it is 
blocked, which is informed by the scheduler over channel block, and then returns to 
the Ready location. The integer variable cr indicates whether the task is in the 
critical section or not, and pp indicates whether the task is accessing the critical 
section or not when it is preempted. If the task is preempted while in the critical 
section, it will at the same time inform the priority control process through the 
synchronization channel pr. 

Each task is an instantiation of the task model giving the values to a set of 
parameters (id, T, C, D, Tl, n, tl, t2, sem). Where, id, T, C, and D are respectively 
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the priority, the period, the computation time and the deadline of a task. T1 denotes 
the initial release time of a task. The integer variable n indicates how many critical 
sections a task is going to access. Arrays tl, t2, and sem are of size n to provide the 
information about accessing these n critical sections. For 1 < i < n, tl[i] gives the 
instant that the iih critical section will be accessed relative to By adding time 

delay d[i], it gives an absolute time. t2[i] gives the interval that the ith critical section 
executes, and sem[i] indicates which semaphore it will lock in order to access the /th 
critical section. The integer variable i indexes these arrays. The variable pri denotes 
the priority of a task. The original priority of a task is equal to its id. The integer 
variable tryS indicates which semaphore a task is going to lock. 



3.2. Priority Control 



The priority control process controls tasks such that a task should have a high 
enough priority in order to access its critical section according to the protocol. Figure 
5 shows a part of the model. The complete model is given at 
http://www.it.dtu.dk/~hs/DIPES. 

Wait 
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Figure 5. A part of the priority control model 

In Figure 5, the model contains four locations, Wait, Check, Critical, and 
Preem. When the priority control process is in the location Wait, it will synchronize 
with the task process over channel try if that task wishes to access a critical section. 
It will then check if that task has a high enough priority in the location Check. It 
raises the flag bio if the task has not high enough priority, so that the task will be 
blocked and the blocking task will inherit the priority of the blocked task. This 
priority inheritance is done through three assignments: temp:=pri[cur], 

pri[cur]:=pri[lock], pri[lock]:=temp. Where, pri[cur] is the priority of the blocked 
task, pri[lock] is the priority of the blocking task. It allows the task to lock the 
semaphore protecting the desired critical section via the synchronization with the 
semaphore process over channel p, if the task has a high enough priority. 

The priority control process is in the location Critical when the task is in the 
critical section. It transits to the Preem location if a higher priority task pre-empts 
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the current running task, and at the same time it informs the corresponding 
semaphore process over channel preem. It informs the corresponding semaphore 
process over channel v when the task completes its critical section, and the task 
resumes then its original priority. It allows a task to access a nested set of critical 
sections via the synchronization over channel try in the location Critical. 

The integer variable j points to the top of stack stacks, which contains all the 
locked semaphores. For the semaphore stackS[j], t[stackS\j]\ denotes how long its 
protected critical section has been executed, and tend[stackS\j]\ denotes the time that 
critical section will be completed. locked[stackS\j]] indicates which task locks the 
semaphore. 



3.3. Semaphore 



A semaphore protecting a critical section is modelled as shown in Figure 6. 



Unlocked 




V? 

curS==ids 
iop:=ifip-lp 
enable :~0 



Figure 6. Semaphore model 

Each semaphore process is an instantiation of the semaphore model giving the 
values to a set of parameters (ids, pc), where ids is the identifier of a semaphore, and 
pc is the priority ceiling assigned to a semaphore. 

When a semaphore is not locked, it is in the location Unlocked. It transits to the 
location Locked if the priority control process allows a task to obtain the lock on it 
(synchronization either over p channel or resume channel). The resume channel is 
used when the task was pre-empted after it had the lock, and now it inherits a higher 
priority and resumes its lock. A stack stackP contains all the priority ceilings of the 
semaphores that are locked currently. The integer variable top points to the top of 
that stack. 

In the location Locked, the semaphore will synchronized with the priority control 
process over channel preem if the task is pre-empted while in the critical section. It 
synchronizes with the priority control process over channel v when the task 
completes its critical section. 



3.4. Scheduler 
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The scheduler model is similar to the one in Figure 2, except that it synchronizes 
with the currently running task process over block channel, in the location Run, 
when the task is blocked. Detailed model is given at 
http://www.it.dtu.dk/~hs/DIPES. 

3.5. Timing Constraints Validation 

The same Observer as shown in Figure 3 is employed to verify the property p by 
using Uppaal. 



3.6. Tests 

Figure 7 shows two sets of tasks, which are verified against the property p using 
model checker of Uppaal. The system behaviours are also simulated using Uppaal. 
The results show that both sets of tasks satisfy the timing constraints. 
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Figure 7. Two tested sets of tasks 

In both case (a) and (b), task P3 has the highest priority and PI has the lowest. PI 
initiates first and P3 initiates the latest. The period of all the tasks is 20 time units. 
The deadline of P5, P2, and PI is respectively 75 , 18 and 20 time units. 

In the case (a), both PI and P2 will access critical section 1 and 2. P3 will access 
critical section 3. PI will lock S2 and enter critical section 2 after running one time 
unit. While in critical section 2 for three time units, it will lock SI and access critical 
section 1 for one time unit. After unlocking SI, it continues with critical section 2 for 
another two time units and then unlocks S2. Finally, after running another two time 
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units, it will finish one computation cycle. The computation time of PI is 9 time 
units. 

Similarly, P2 will first lock SI then S2, and unlock S2 then SI as shown in the 
figure. The computation time of P2 is 7 time units. P3 will lock S3 and access 
critical section 3 for one time unit and then unlock S3. Its computation time is 5 time 
units. 

In the case (b), PI will access critical section 1 and 2, P2 critical section 2, and 
P3 critical section 1 and 3. Again, PI will first lock S2 then SI, and unlock SI then 
S2. P2 will only lock S2 for one time unit and then unlock S2. P3 will lock S3 first 
for one time unit, and after unlocking S3 it will lock SI for one time unit and then 
unlock SI. The computation time for PI, P2, and P3 is respectively 11,4, and 5 time 
units. 

For both cases, the property p is satisfied as the result of using the model 
checker of Uppaal. 



4. Conclusion 

This paper uses two scheduling strategies, rate monotonic priority assignment and 
priority ceiling protocol, as examples to discuss and illustrate a formal approach for 
schedulability analysis by means of model checking tool Uppaal. This approach 
allows also validating other properties of, e.g. correctness, while validating timing 
constraints of real-time tasks. In addition, the restrictions on the task model (as in 
most classical schedulability analyses) can be relaxed, e.g. it allows different 
deadlines and initial release instants as shown in the paper. It is possible to allow 
also different or concrete task models. 

At the present stage of this work, the correctness of tasks and scheduling 
strategies is validated using the simulator to visualize traces of the system 
behaviours. It could be formalized as certain properties that can be verified using the 
model checker of Uppaal. 
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Abstract Real-time systems require both functionally correct 
executions and results that are produced in time. Thus, the 
scheduling algorithm is an important component of these systems. 
Several dynamic scheduling algorithms for real-time 
multiprocessor systems using heuristic approaches such as 
famous myopic algorithm and its variations have been proposed. 
However, the task assignment policies used by all these 
scheduling algorithms hinder the improvement of the scheduling 
success ratio. In this paper, we propose a new dynamic scheduling 
algorithm, called “thrift algorithm ”, for real-time multiprocessor 
systems. By using a new task assignment policy in this algorithm, 
thrift algorithm improves the scheduling success ratio 
successfully. To study the effectiveness of thrift algorithm, we have 
conducted extensive simulation studies and compared its 
scheduling success ratio with that of myopic algorithm when 
several task parameters are changed. Simulation results 
demonstrate that the scheduling success ratio of thrift algorithm is 
superior to that of myopic algorithm. 

Key Words Multiprocessor, real-time systems, dynamic 
scheduling, scheduling success ratio, backtracks 



1 1ntroduction 

As real-time applications are more and more complex, multiprocessor systems have 
emerged as a powerful computing means to deal with this kind of complicated 
applications because of their high performance and reliability. Real-time 
multiprocessor systems have been used widely in the filed of nuclear power plants, 
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flight control and avionics. In real-time systems, the computing results must not only 
be functionally correct but also be produced timely. This means that the scheduling 
algorithm is an important component of these systems. In general, the problem of 
multiprocessor scheduling is to determine when and on which processor a given task 
executes. The real-time multiprocessor scheduling can be either static or dynamic. In 
static algorithms, the assignment of tasks to processors and the time at which tasks 
start execution are determined a prior. Thus, their main advantage is that, if a 
solution is found, one can be sure that all deadlines will be guaranteed. However, 
static scheduling algorithm is not suitable for the dynamic real-time systems where 
tasks arrive dynamically and their characteristics are not known a priori. In this case, 
it is apparent that we need dynamic scheduling algorithms. 

In dynamic real-time scheduling for real-time multiprocessor systems, when new 
tasks arrive, scheduler must dynamically determine the feasibility of new tasks and 
schedule these new tasks without jeopardizing the guarantees that have been 
provided for the previously scheduled tasks. Therefore, schedulability analysis must 
be done before a task’s execution is begun. If the schedulability analysis is 
successful, tasks are dispatched according to this feasible schedule. 

Dertouzos and Mok[4] have showed that an optimal scheduling algorithm does 
not exist for dynamic real-time systems with more than one processor and/or tasks 
that have mutual exclusion constraint. These negative results point out the need for 
heuristic approaches to solve scheduling problem in such systems. 

Several dynamic scheduling algorithms for real-time multiprocessor systems 
using heuristic approaches have been proposed. Krithi Ramaritham etc. [I] presented 
famous myopic scheduling algorithm which is applied to dynamic scheduling for 
real-time multiprocessor systems. Basing on traditional heuristic approaches, myopic 
scheduling algorithm limits the number of considered tasks in one scheduling step 
and reduces the complexity of the algorithm. Furthermore, myopic algorithm 
considers the situation in which tasks require other resources besides processors. 
Later, basing on myopic scheduling algorithm, Manimaran and Murthy [2] proposed 
a new dynamic scheduling algorithm for real-time multiprocessor systems. They 
used the parallelization of tasks to improve the performance of the scheduling 
algorithm. Integrated real-time scheduling algorithm presented by Anita Mittal 
etc. [3] is also based on myopic algorithm. It is applied in the multiprocessor systems 
in which hard real-time tasks combine soft real-time tasks. However, the task 
assignment policies used by all scheduling algorithms addressed above is that when a 
task is choose to extend the current partial scheduling, the earliest available 
processor among all processors which can meet the task’s deadline will be selected. 
This task assignment results in the scheduling failure for unscheduled tasks since it 
delays the start time of the unscheduled tasks. Moreover, [2] [3] will be not 
applicable when the real-time tasks are not parallelizable or all tasks in the system 
are hard real-time tasks. To solve these problems, we propose a new dynamic 
scheduling algorithm, called “thrift algorithm”, for multiprocessor real-time systems. 
This new algorithm improves the scheduling success ratio by introducing a new task 
assignment policy. 

The rest of this paper is organized as follows: Section 2 introduces some basic 
concepts of our scheduling algorithm including task model, scheduling model and 
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some definitions. Section 3 describes thrift algorithm and its analysis. Simulation 
studies are listed in Section 4. In Section 5, we state some conclusions. 



2 Basic Concepts 

2. 1 Task Model 

We assume that there are m processors in a homologous real-time multiprocessor 
system (m>l). The characteristics of tasks in this real-time multiprocessor system are 
listed below: 

i) Each task T is characterized by its arrival time(aT), ready timeCrj), worst case 
computation time(CT). 

ii) Tasks are nonpreemptive, nonperiodic and independent. 

iii) Besides processors, tasks might need some other resources such as data 
structures, variables and communication buffers for their executions. Every 
task can access a resource either in shared mode or in exclusive mode. 

2.2 Scheduler Model 

We use centralized scheduling scheme in this scheduling algorithm. In this 
scheduling scheme, all tasks arrive at a central processor called central scheduler, 
from where they are distributed to other processors in the system for their executions 
[2]. Each processor has its own dispatch queue. Thus, this organization ensures that 
the processors will always find some tasks in the dispatch queues when they finish 
the execution of their current tasks. The communication between the scheduler and 
the processors is through dispatch queues. Moreover, the central scheduler will be 
running in parallel with the processors, scheduling the newly arriving tasks and 
periodically updating the dispatch queues. 

2.3 Terminology 

Definition 2.1 A task is feasible in a schedule if its timing constraint and resource 
requirements are met in the schedule. A schedule for a set of tasks is said to be a 
feasible schedule if all the tasks are feasible in the schedule [2]. 

Definition 2.2 A partial schedule is a feasible schedule for a subset of tasks. A 
partial schedule is said to be strongly feasible if all the schedules obtained by 
extending the current schedule by any one of the remaining tasks are also feasible 
[2][3]. 

Definition 2.3 EAT^^ (EAT^^) is the earliest time when resource becomes 
available for shared (or exclusive) access [2] [3]. 

Definition 2.4 lEST(T) is the ideal earliest start time of a task T. Let PE be the set 
of processors and Rj be the set of resources required by task T. Thus, lEST(T) = 
MAX(rx, MINpePE(availtime(P)), MAKRtgRrEATk^)), where availtime(P) denotes the 
earliest time at which the processor P becomes available for executing a task and the 
third term denotes the maximum among the earliest available times of the resources 
required by task T ( u = s for shared mode and u = e for exclusive mode) [2]. 
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Definition 2.5 AST(T) is the actual start time of task T. 

Definition 2.6 gaptime(T,P) denotes the gap between the deadline of task T and the 
earliest time at which the processor P becomes available for executing a task, i.e., 
gaptime(X,P)=dT- availtime(P). 

Definition 2.7 avail(T,P) denotes the feasibility of task T for executing on the 
processor P. If processor P can provide enough time for task T’s execution, 
avail(T,P) = 1; otherwise, avail(T,P) = 0. 



3 Thrift Algorithm 

3. 1 Algorithm 

Myopic algorithm uses heuristic search for dynamically arriving tasks which have 
resource constraints. In the search tree, a node denotes a partial schedule. The 
schedule is extended from the node when the partial schedule denoted by this node is 
strongly-feasible. Furthermore, it performs feasibility check for the k tasks in the 
feasibility check window. 

The task assignment policies used by several dynamic scheduling algorithms 
based on myopic algorithm, including myopic algorithm itself, hinder the 
improvement of the scheduling success ratio. To solve this problem, we propose 
thrift algorithm based on myopic algorithm. In thrift algorithm, a new task 
assignment policy is used to improve the feasibility of unscheduled tasks and the 
scheduling success ratio. 

We assume that the real-time system consists of S resources denoted by 
Ri,R 2 ,...,Rs. Thrift algorithm maintains the task-resource list (RET) which records 
tasks’ resource usage. RET has S entries. Each resource maps to one entry in RET 
and has two record variables where one is used to record the number of tasks which 
access this resource, the other is used to record the number of tasks which access this 
resource in exclusive mode. When a new task arrives, these two record variables will 
be modified according to the task’s resource usage. 

In thrift algorithm, the new task assignment policy is denoted as function 
choosep(T). This function returns the member of processor selected for task T. The 
basic idea of this task assignment policy is to delay the actual start time of task T as 
possible without missing task T’s deadline and let the available time of selected 
processor approach task T’s deadline as close as possible. The detailed task 
assignment policy is listed below: (We assume that ESTRt is the earliest available 
time of the resources required by task T.) 

1) If the resources required by task T are no more than the processors, we will 
choose processor P which can meet the following constraint for task T. The 
constraint is: 

gaptime(T,P) = MINpeg{pe|pegpEandavail(T,pe)=l} (S^ptim®(T,pe)). 

2) If task T requires some other resources besides the processors, we will employ 
the method listed below: 

2.1) Check the task-resource list (RET). 




A New Dynamic Scheduling Algorithm for Real-Time Multiprocessor Systems 177 



2.2) If there is no intersection between the resources required by task T and the 
resources required by the remaining tasks, we will employ the same 
processor selection policy as 1) for task T. 

2.3) If there is an intersection between the resources required by task T and the 
resources required by remaining tasks and the modes used by task T and 
remaining tasks to access the resources in this intersection are all shared, we 
will employ the same processor selection policy as 1) for task T. 

2.4) In other cases, our policy used to choose the processor for task T is listed 
below: 

2.4. l)If rx < ESTRt and ESTRt = MAX {pe|pe€PE and avaii(T.pe)=i }(availtime(pe)), we 
will employ the same processor selection policy as 1) for task T. 

2.4.2) If rx> ESTRt and rx> MAXpee{pe|pe€PEandavaii(T,pe)=i}(availtime(pe)), we will 
employ the same processor selection policy as 1) for task T. 

2.4.3) If Tj < ESTRt and MAX pee{pe|peePE and avail(T,pe)=l}(availtime(pe))^STRT> 
MINpeePE(availtime(pe)), we will choose processor P which can meet the 
following constraint for task T. The constraint is: 

avail(T,P) = 1 and availtime(P) = ESTRt . 

2.4.4) If Tj < MINpeePE(availtime(pe)) and ESTRt< MINpeePE(availtime(pe)), we 
will choose processor P which can meet the following constraint for task T. 
The constraint is: 

availtime(P) = MINpee{pe|pe6PEandavail(T,pe)=l}(availtime(pe)). 

2.4.5) In other cases, we will choose processor P which can meet the following 
constraint for task T. The constraint is: 

rT-availtime(P) = MIN pee{pe|peePE and avail(T, pe)=l and availtime(pe)<^T }( r-p- 

availtime(pe)) and avail(T,P) = 1. 

Thrift algorithm is given below: 

(We assume that the size of the feasibility check window is K, task T’s deadline is 
denoted by dxand the weight is W.) 

1) Order the tasks in the task queue in non-decreasing order of their deadlines and 
then start with an empty partial schedule. 

2) Determine whether the current partial schedule is strongly feasible by 
performing feasibility check for K or less than K tasks in the feasibility check 
window. If the current partial schedule is strongly feasible, feasible = true; 
otherwise feasible = false. 

3) If (feasible = = true) 

3.1) Compute the heuristic function H value for the K or less than K tasks in 
the feasibility check window, where H(T)=dT+W*IEST(T) for task T. 

3.2) Extend the schedule by task T having the smallest H value in the 
feasibility check window and choose a suitable processor for task T by 
using function choosep(T). 

Else 

3.3) Backtrack to the previous search level. 

3.4) Extend the schedule by the task T’ having the next smallest H value in 
this search level and choose a suitable processor for task T’ by using 
function choosepij'). 

4) Move the feasibility check window by one task. 
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5) Repeat steps 2-4 until any termination condition listed below is met: 

a) a complete feasible schedule has been found; 

b) maximum number of backtracks or H function evaluations has been 
reached; 

c) no more backtracking is possible. 



3.2 complexities 

We assume that the real-time system consists of n tasks, M processors and S 
resources. The size of the feasibility check window is K. Thrift algorithm has n steps 
and only K tasks in the feasibility check window will be considered to perform 
feasibility check and compute heuristic function value in each step. Furthermore, 
only M processors will be considered in the worst case when we call function 
choosep in each step and only S entries will be checked when we search task- 
resource list. Thus, the complexity of thrift algorithm is 0((K+M+S)n), denoted by 
O(K’n). The complexity of thrift algorithm is similar to myopic algorithm and in 
proportion to the number of tasks in the task queue since both M and S are constant 
and much smaller than n. 



4 Simulation Studies 

To study the effectiveness of thrift algorithm, we have conducted extensive 
simulation studies. Since the scheduling success ratio is the most important metric 
used in the performance evaluation of real-time scheduling algorithm, we only 
consider this metric in our simulation studies. 

In these simulation studies, we use the method in [1] to generate 200 schedulable 
task sets. Each task set contains 40 to 80 tasks by fixing the schedule length [1] as 
800. The simulation parameters and their values are described in Table 4.1. 
Firgure4.1, 4.2, 4.3, 4.4 and 4.5 represent the scheduling success ratio by varying 
Backnum, W, K, Use__P and R values respectively. (MA denotes myopic algorithm, 
TA denotes thrift algorithm and SR denotes the scheduling success ratio.) 

4 . 1 Effect of Maximum Number of Backtracks 

Figure 4.1 shows the effect of maximum number of backtracks on the scheduling 
success ratio. In this simulation, we fix Use_P, Share_P, K, W and R values as 
0.2,0.5,7,8 and 0.2 respectively. The simulation result shows that increasing 
maximum number of backtracks will increase SR of both algorithms and the increase 
extent of both algorithms’ SR is small. Furthermore, SR of thrift algorithm is always 
higher than that of myopic algorithm during the change of maximum number of 
backtracks. 



4.2 Effect Of Weight 
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Figure4.2 reflects the effect of weight on SR. We let Use_P, R, Share_P, K and 
Backnum values be 0.2, 0.2, 0.5, 7 and 10 respectively. The simulation result shows 
that SR of thrift algorithm is always higher than that of myopic when the weight is 
changed. SR of myopic algorithm increases with varying W from 0 to 4 and 
decreases when W is increased beyond 4 while SR of thrift algorithm increases with 
varying W from 0 to 6 and decreases when W is increased beyond 6. Moreover, 
myopic algorithm is more sensitive to the change of W. This is because when W is 
very large, the integrated heuristics used by both algorithms behave like simple 
heuristics, which only take care of the availability of processors and resources and 
ignore task’s deadline. Similarly, when W=0, SR of both algorithms are very poor, 
since the integrated heuristics reduce to EDF(earliest deadline first). 



Table 4.1 Simulation Parameters 



Parameter 


Explanation 


Procnum 


Number of processors considered for simulation, taken 
as 3. 


Resnum 


Number of resource types considered for simulation, 
taken as 2. 


Backnum 


Maximum number of backtracks permitted on the 
search. 


K 


Size of feasibility check window. 


w 


Weight in H function. 


Max_exec 


Maximum computation time of tasks, taken as 60. 


Min_exec 


Minimum computation time of tasks, taken as 30. 


R 


Laxity of tasks. It denotes the tightness of the deadline. 


Use_P 


Probability that a task uses a resource. 


Share_P 


Probability that a task uses a resource in shared mode. 



♦' TA M\ 




The nunber of backtracks 



TA M\ 




V^i ght 



Figure 4.1 Effect of number of backtracks 



Figure 4.2 Effect of weight 
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4.3 Effect of Size of Feasibility Check Window 

The effect of size of feasibility check window on the scheduling success ratio is 
listed in figure 4.3 by fixing Use_P, Share_P, W, Backnum and R values as 0.2, 0.5, 
8, 10 and 0.2 respectively. From the simulation result, we observe that SR of thrift 
algorithm is higher than that of myopic algorithm when K values of these two 
algorithms are equal. SR of both algorithms increases when K is increased. This is 
because that for large K values, both algorithms have more look ahead nature which 
improves the scheduling success ratio of algorithm. 




Use_P 



Figure 4.3 Effect of size of feasibility Figure 4.4 Effect of resource usage 

4.4 Effect of Resource Usage 

Figure 4.4 represents the effect of resource usage on the scheduling success ratio 
while fixing Share_P, K, W, Backnum and R values as 0.5, 7, 8, 10 and 0.2. From 
Figure 4.4, it is noted that SR of both algorithms increases with increasing resource 
usage probability. This is because increase of the resource usage possibility results in 
more resource conflict among tasks and makes task’s ideal earliest start time more 
sensitive to available time of resources. Furthermore, SR of thrift algorithm is always 
higher than that of myopic algorithm when Use_P is changed. The reason leads to 
this situation is that decreasing Use_P makes the earliest available time value of all 
processors in the system smaller after task’s execution and increases the feasibility of 
unscheduled tasks. 

4.5 Effect of Laxity 

The effect of laxity on the scheduling success ration is described in figure 4.5. In this 
simulation, Use__P, Share_P, K, W, and Backnum values are fixed as 0.2, 0.5, 7, 8 
and 10 respectively. The simulation result reveals that during the change of laxity, 
SR of both algorithms increases and SR of thrift algorithm is superior to that of 
myopic algorithm. However, the advantage of thrift algorithm on SR decreases by 
increasing the laxity. This is because that increasing laxity makes more possible to 
meet task’s deadline using task assignment policy in myopic algorithm alone. 
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Figure 4.5 Effect of laxity 



5 Conclusions 

For real-time systems, meeting deadlines is the most important goal of the task 
scheduling, especially for hard real-time systems. Therefore, the most important 
performance metric for scheduling algorithm is the scheduling success ratio. Several 
scheduling algorithms based on myopic algorithm have been proposed to apply in 
the dynamic real-time multiprocessor systems. However, all task assignment policies 
used in these algorithms, including myopic algorithm itself, hinder the improvement 
of the scheduling success ratio. To solve this problem, we propose a new dynamic 
scheduling algorithm for real-time multiprocessor systems, called thrift algorithm. 

In thrift algorithm, we develop a new task assignment policy. By delaying the 
task’s actual start time as possible without missing task’s deadline, this policy 
improves the algorithm’s scheduling success ration since it increases the feasibility 
of unscheduled tasks. The simulation results demonstrate that the scheduling success 
ratio of thrift algorithm is superior to that of myopic algorithm for a wide variety of 
task parameters. Moreover, the complexity of thrift algorithm is similar to that of 
myopic algorithm. 
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DERIVING MESSAGE PASSING 
PROTOCOLS FROM COLLECTIVE 

BEHAVIOR 



Pertti KellomSki 



1. Introduction 

Many embedded systems applications involve distributed co-operating components. 
For cost reasons, they must be designed to use computational and communication 
resources as sparingly as possible. To achieve this, carefully hand-optimized 
solutions are often used. However, reasoning about such solutions at the 
implementation level using deductive techniques is like trying to verify the object 
code produced by an optimizing compiler: much of the structure and abstractions 
present at the high level of design have been lost. Recovering abstractions can be 
quite expensive, necessitating the formulation and verification of large strengthening 
invariants. 

In this paper we hope to demonstrate that verifiability and efficiency are not 
mutually exclusive. One can write specifications that convey the designer’s intent at 
an abstract level and are amenable to formal verification, and at the same time yield 
optimized implementation level solutions. Implementation level efficiency does not 
need to suffer from the abstractions built into a specification. 

We argue that both design and verification are easier with a methodology that 
starts with the collective behavior and explicitly represents the underlying 
abstractions. When concurrent activities involve multiple interacting objects, 
coordination of the activities is difficult. In the telecommunications field, this feature 
interaction problem is a well-known source of difficulties. We try to avoid these 
problems by starting with a specification of the high level collective behavior that 
makes the atomicity of high level operations explicit. The paper gives 
methodological guidelines to help a designer in deriving low level message passing 
protocols from high level specifications of collective behavior. We also show how 
the methodology is used in a derivation and verification of a protocol achieving 
atomic insertions and deletions in a distributed linked ring, borrowed from [PD96]. 
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The rest of the paper is structured as follows. In Section 2 we describe the use of 
superposition and the DisCo specification method. Section 3 describes the 
specification methodology for deriving a message passing implementation from a 
specification of collective behavior. An example derivation is given in Section 4, 
related work is reviewed in Section 5, and conclusions are drawn in Section 6. 



2. The DisCo approach 



In this section we describe the joint action [BKS88,BKS89] approach to 
specification and the use of superposition, which form the foundations of the DisCo 
[WWW00,JKSSS90,JKS91,KS96] specification method. 

2. 1. Joint actions 

A joint action is a formula for a multi-party operation that describes a step in a 
computation, involving a synchronization of the participants of the action. An 
action may have parameters, which are similar to participants except that they 
denote values, not objects with mutable state. The values assigned to the variables of 
the participants in an execution of an action may freely refer to the attributes of the 
other participants and the parameters of the action. 

A joint action defines inter-object cooperation at a high level of abstraction, 
where the focus is lifted from communication details and the behavior of individual 
objects to the collective behavior. The states of the objects that do not participate in 
the execution of an action remain unchanged. Parallel activities are modeled with an 
interleaving semantics. 

The following specification illustrates how classes and actions are defined using 
the DisCo syntax: 

system LI is 

class C is X : integer; end; 
action inc(n : integer) by a, b : C is 
when a.x < b.x do a.x := b.x + n; b.x := a.x + n; end; 
end; 

If a combination of objects satisfying the guard of the action can be found, the 
action is enabled for the objects and can be executed. The choice of which enabled 
action to execute and for which combination of participants and parameters is 
nondeterministic. 

2.2. Superposition 

Stepwise refinement in the DisCo approach is based on superposition, where new 
state variables are incrementally added to a specification until the desired level of 
detail is reached. The actions of the specification being refined may be augmented 
with assignments to the new state variables, but new assignments to state variables 




Deriving Message Passing Protocols from Collective Behavior 



185 



introduced earlier may not be given. The initial condition and the guards of the 
actions may be strengthened. 

Superposition preserves safety properties by construction, since all assignments to 
a state variable are given when the state variable is introduced. DisCo specifications 
are thus closed world specifications. 

Nondeterministic action parameters can be used for modeling values determined 
by state variables not visible at a given level of abstraction. By augmenting actions 
with new constraints on the values the parameters may assume, one can constrain the 
nondeterminism at a later stage. 

The following layer illustrates how superposition is done in the DisCo language. 

system L2 import LI; is 
class stepper is step : integer; end; 
extend C by counter : integer; end; 

refined inc by s : stepper is when . . . n = s.step do 
... s.step := s.step + 1; 

a.counter := a.counter + 1; b.counter := b.counter + 1; 

end; 

end; 

The ellipsis “...” denotes parts from the action being refined. Superimposing L2 
on 17 is expressed in DisCo by importing specification LI into 12. 



3. Specification methodoiogy 

In this section we describe how an abstract specification of the collective behavior of 
a distributed system can be systematically transformed into an implementable 
specification that employs message passing. 

The most abstract specification gives the effects of the operations on the objects 
of the system using joint actions. At this level of abstraction, an action can access the 
attributes of all the participants, irrespective of whether it is physically possible to 
access the objects simultaneously in the actual system. 

In the next step, distribution is introduced as abstractly as possible. For each class 
C, whose attribute X needs to be accessed remotely, we introduce a class mC of 
mobile objects. Classes C and mC are augmented with new variables x and valid. 
The value of X is stored in either C.x or mC.x, as indicated by valid. To associate an 
object of class mC with an object of class C, class mC contains a reference 
represents. The invariant that allows treating as a “shadow” variable that need not 
be explicitly represented in an implementation is 

V c : C wc : mC :: mc.represents = c => 
c.valid =» C.X = c.x 
A mc.valid => c.X = mc.x. 
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The mobile object can be thought of as a representative of the corresponding 
immobile object. Accessing a shadow variable of an object does not require 
synchronization with the object, provided that its representative is present. 

What distinguishes our approach is that at the specification level, the mobile 
objects have mutable state. Instead of sending and receiving immutable messages, an 
object communicates with a remote object by assigning values to the variables of the 
representative of the remote object. Modeling distribution in this fashion introduces 
the essential aspects of distribution (asynchronous changes to state variables) while 
keeping the model as simple as possible. The simple model means that deductive 
formal verification of the invariant is almost trivial. 

Introducing representative objects differs from ordinary data abstraction in that it 
is the location, not the representation of data that is being abstracted. Once the 
representative objects have been introduced, we can use data abstraction to derive the 
implementation level messages. The step from the abstract behavior to representative 
objects is standard enough that it could be supported by parametric superposition 
steps [KMOO]. The linking invariant could then be verified once for the parametric 
layer, and reused by verifying the assumptions of the layer. 

The resulting specification describes a very nondeterministic system. The mobile 
objects are free to roam about and interact with other objects at will. In order to 
constrain this nondeterminism, we introduce a state machine in the representative 
objects. When the valid attribute of an object is false, the state machine is in a state 
labeled nojnessage. For each interaction we wish the object to participate in, we 
introduce a corresponding state in the state machine, and refine the corresponding 
action to require that the representative object is in a suitable state. 

The attributes mc.valid, mc.represents, and mc.x can be made shadow variables 
by using the state machine. The state machine itself implements mc.valid, if we 
arrange the state machine to exit state no message in every action where valid is 
assigned the value true, and enter the state nojnessage in every action where valid is 
assigned the value false. For mc.represents and mc.x to be implemented, each of the 
states of the state machine needs to have attributes from which they can be 
computed. Since mc.x is not accessed when mc.valid is false, mc.x can be set to some 
known constant value when mc.valid is set to false, so only mc.represents needs to 
be implemented in state nojnessage. 

For example, if we wanted to implement a client-server protocol where clients 
send requests to servers, we would introduce a state machine with states 
nojnessage, request, and reply. State request would contain attributes from 
(implementing mc.represents) and current jc (implementing mc.x), and state reply 
would contain attributes to (implementing mc.represents) and newjc (implementing 
mc.x). State nojnessage would contain attribute The following table illustrates 
how C.X is implemented by the lower level variables: 



C.X 


implemented by C.x 


implemented by mC.x 


implemented by 
mC.request. current x 


implemented by 
mC. reply. new x 
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messages 



distribution 






unoptimized 
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add_details 



Figure 1 . Overview of the specification 

The final step to a message passing implementation is to observe that we can 
interpret state nojnessage as the nonexistence of a message and the rest of the states 
of the state machines as messages. A state transition of the state machine 
corresponds to receiving a message and sending a new message. Since the attributes 
contained in the states are only assigned to when the state is entered, the messages do 
not need to be mutable. 

The construction outlined above results in a correct but potentially inefficient 
implementation. It leaves some aspects underspecified, e.g. how messages find their 
way to their recipients. These aspects can be specified using further superposition 
steps, with the guarantee that the steps cannot invalidate the correctness properties 
provided by the higher levels of specification. The derived protocol can also be 
optimized by means of verifying additional invariants, as illustrated in the next 
section. 



4. An example; distributed Ring 

The example presented in this section is based on the description in [PD96]. The 
objective of the protocol is to maintain a singly linked distributed ring of cells 
(processors interested in a particular cache line in a multiprocessor system). A cell 
not in the ring may request admission to the ring, and a cell in the ring may request 
to be removed. 

Figure 1 depicts the structure of the specification. The three layers abstract, 
distribution and messages correspond to the layers described in Section 3. Layer 
addjdetails specifies further details of adding a cell. Layer token describes how 
activities are coordinated using a token, and layer delete jiet ails specifies how this 
coordination is applied to deletions. Layer unoptimized merges the two branches into 
a specification of a complete but inefficient protocol. Layer optimized specifies a 
more efficient protocol where a message and some fields of messages have been 
shown redundant by verifying an invariant. 

We will outline the specification here, the full specification can be found at URL 
http://disco.cs.tut.fi. 
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4. 1. The collective behavior 

The specification abstract of the collective behavior gives add and delete as atomic 
operations on the ring in terms of variable anext, which is the specification level 
variable linking cells in the ring. The specification is short enough to be given it in 
its entirety: 



system abstract is 

class cell is anext : cell; end; 

action add by prev, c : cell is when c.anext = c 
do c.anext := prev.anext; prev.anext := c; end; 

action delete by prev, c :cell is when prev.anext = c 
do prev.anext := c.anext; c.anext := c; end; 
end; 

The specification does not constrain where cells may be added or removed, nor 
does it specify when these actions should take place. This is intentional: the 
specification only gives the essential core behavior. 

4.2. Representatives 

Layer distribution introduces the representative objects. The layer makes anext a 
shadow variable, implementing it by the state variable next, as explained in Section 
3. The cell class is augmented with new state variables, and a new class 
representative is introduced: 

extend cell by valid : boolean; next : cell; end; 
class representative is represents : cell; valid : boolean; next : 
cell;end; 

New actions send_representative and receive_representative are introduced, and 
actions add and delete are augmented to require a representative of the requesting 
object as a new participant, admitting a distributed implementation. The refinement 
of add illustrates how this is accomplished in DisCo: 

refined add by ... r : representative is 

when ... prev. valid and r. valid and r.represents = c 

do prev.next := r.next; r.next := prev.anext; ... end; 

4.3. Implementation level messages 

Layer messages introduces a state machine in the representative objects as follows: 



extend representative by 
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state *no_message(from__c : cell), add_request(from_c : cell; 
current_next : cell), add_reply(to_c : cell; new_next : cell), 
delete_request(from_c : cell; current_next : cell), 
delete_reply(to_c : cell; new_next : cell); 

end; 

Two refinements of action send_representative are introduced: send_add_request 
that sends a representative in state add_request, and send_delete_request that sends a 
representative in state delete _request. Action receive ^representative is refined 
similarly. The variables contained in the state items implement variables valid, 
represents and next. 

4.4. Head cell 

We next refine messages relating to addition and deletion in separate branches of 
specification. In layer delete _det ails, a dedicated head cell is introduced, and class 
cell is augmented with an attribute myjiead. State add_request is augmented with 
to_c, the address of the recipient of the address, and action send_add_request is 
refined to send requests to the cell whose address is found in myJiead. 

4.5. Token 

Layer token introduces class token and actions to start and end an activity that can be 
used for mutual exclusion. The safety property provided by the layer is that actions 
start ^activity and stopjactivity occur in a strictly alternating sequence: 

system token is 

class token is state *no_activity, activity_in_progress; end; 
assert single_token is V tl, t2 : token :: tl = t2; 
action start_activity by t : token is when t.no_activity 
do -^t.activity_in_progress; end; 

action stop_activity by t : token is when t.activity_in_progress 
do ->t.no_activity; end; 
end; 

4.6. Coordinating deletions 

Deletions from the ring must be coordinated, because deletion requires cooperation 
from the predecessor in the ring. In order for deletion to succeed, the predecessor 
must be in the ring. Layer delete _det ails utilizes the token to this end. Composing 
sendjdelete_request and start jactivity using the combined clause composes the 
actions from the imported layers into one atomic action. The safety property from 
layer token ensures that actions c_send_delete_request and cjdelete alternate: 

system delete_details import messages, token; 
combined c_send_delete_request 
of messages.send_delete_request, token.start_activity; 
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combined c_delete of messages.delete, token.stop_activity; is 
details omitted end; 

4.7. The unoptimized protocol 

The separate branches of specification are composed in layer unoptimized, which 
simply imports the detailed specifications of add messages and delete messages. The 
only new element is the initial condition stating that initially the head cell is in 
possession of the token. 

system unoptimized import add_details, delete__details; is 
initially head_has_token is 

V he : cell; t : token :: hc.my_head = he => t.no_activity.to_c = he; 

end; 

4.8. The optimized protocol 

The final layer specifies the optimized protocol by means of additional invariants. 
The invariants justify the omission of certain objects and attributes from an 
implementation. 

The first invariant concerns the token. The states no_activity and 
activityjn _progress were used in layer token to specify the mutual exclusion 
property. However, when it is composed with the specification of delete messages, 
the following invariant holds in the composed system: 

V r : representative', t : token :: 

r .delete request ^ t. activity in _progress. 

The invariant allows us to implement the token only when it is in state nojactivity. 

The implicit token is not present in the implementation level description of the 
protocol. However, having it as an explicit object in the specification makes it 
possible to address the concern of mutual exclusion in an independent layer, and to 
combine it with a specification of messages relating to deletion. 

The next invariant allows us to omit the newjiext field of a delete reply. We 
observe that the value in delete _r eply. new _next will always be equal to the recipient 
of the reply, expressed as 

V r : representative :: r. delete _r eply => 
r. delete _r eply. new _next = r.delete_reply.to_c. 

This invariant justifies omission of the new next field from a delete reply message. 

Not surprisingly, the protocol derived here is essentially the same as described in 
[PD96]. However, we believe that the derivation from collective behavior makes the 
specification easier to understand. It certainly makes it easier to verify than the 
original implementation-level description. 
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5. Discussion 

Since real systems are mostly too complicated to be verified at the implementation 
level, abstraction has been one of the central themes in verification (e.g. 
[BH99,CGL94,LGSBB95,LS84]). The standard way of using abstraction is to 
provide an abstraction function {abstraction relation) that links states in an abstract 
specification with states in a more concrete specification. The abstraction function is 
chosen in such a way that verifying a property for the abstract specification implies 
that it also holds for the more concrete specification. 

Our approach differs from the usual use of abstraction in that we do not use an 
explicit abstraction function. When superposition is used, state variables of the 
abstract specification are also present in the more concrete specification, so instead 
of an abstraction function linking variables in two separate specification, we link the 
abstract and concrete variables with an invariant in one specification. 

Verification based on explicit or implicit state space exploration has made 
impressive progress in the recent years. However, it appears that there are still 
difficulties with verification of systems where the number of objects is arbitrary and 
the identities of the objects cannot be abstracted away [PD97]. Deductive techniques 
such as mechanical theorem proving do not share this difficulty. 

Often one of the main problems in verifying a system is to invent the correct 
abstractions. We try to alleviate this problem by outlining a systematic methodology 
for building useful abstractions for a specific class of systems. Expressing the 
abstractions as first class entities in the concrete specification helps in understanding 
the low level behavior in terms of the higher level abstractions. Some properties of 
the system are also nicely captured by the use of object containment. Park and Dill 
[PD96] verify the linked ring protocol described in this paper and comment that they 
have chosen to represent the network “in a non-obvious way” so that there is one 
variable per cell to hold a message pertaining to the cell. They then prove that the 
network can indeed be represented in this way without messages being overwritten. 
In our formulation, this property is a direct consequence of messages being 
represented as states of a state machine contained in a representative object. 



6. Conclusions 

We have demonstrated how message passing protocols can be systematically derived 
from descriptions of the high level collective behavior. Such a derivation results in 
an underspecified description of the protocol that can be further refined using 
superposition steps that preserve safety properties by construction. The derivation 
thus provides the core correctness properties of the protocol, with the guarantee that 
the mechanisms introduced in later steps cannot invalidate them. The protocol can be 
optimized by means of verifying additional invariants. 

Verification of the derived protocols is considerably easier than verifying 
equivalent implementation-level specifications. This is due to the abstractions built 
into the specification by the superposition process. Some parts of the derivation can 
be supported by parametric superposition steps. 
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We believe that our approach provides considerable advantages when the system 
being specified involves concurrent interactions between multiple objects. In this 
domain, specifying the collective behavior first helps to avoid accidental interference 
of the individual interactions. 
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As the complexity of distributed embedded systems is increasing, 
and frequent upgrades and maintenance are expected, 
conventional communication models, such as point-to-point and 
the client-server model are not adequate to keep up with these 
trends. So we propose a generic Java real-time communication 
model based on the publish-subscribe model which has proper 
characteristics for distributed applications. The proposed Java 
publish-subscribe model has been implemented as middleware, 
named Java Embedded Bus (JEB). To easily support 
heterogeneous communication technologies, we propose a 
middleware-independent plug-and-play JEB device driver model. 
We have also prototyped the middleware on the demonstrative 
distributed embedded systems which comprised embedded 
machines and workstations connected by VME bus and Ethernet. 



1. Introduction 

Java network computing technology has had a significant impact on distributed 
systems that are typically connected by TCP/IP Internet. Since distributed systems 
are composed of heterogeneous computing systems, i.e. different hardware and 
operating systems, Java’s compatibility of byte codes and unified communication 
interface have tremendous appeal in the area of distributed systems. Inspired by the 
success of Java in conventional systems, there are significant efforts to adopt Java in 
embedded systems. The Java technology has already shown its success in Internet 
appliances such as set-top box, PDA, and hand-held-computers. Therefore, it is 
strongly anticipated that Java will be a major programming environment in complex 
distributed embedded systems in the near future. 
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The reasons why Java is favorable in distributed embedded systems are as 
follows. Firstly, Java provides a reliable and secure computing environment, ensured 
by Java’s safety features such as strong compile- and run- time verifications, an 
efficient exception-handling-mechanism and sand-box security model. Secondly, 
Java provides strong portability and reusability, which industry can take advantage 
of, reducing costs and development turn around time. Despite its versatile 
advantages, Java still has weaknesses in applications of embedded systems that 
require strict timing constraints and deterministic behaviors. For example, Java’s 
problems are undeterministic behavior of garbage collection, lack of direct hardware 
access, low performance, and weakness of synchronization, etc. However, with the 
early consensus of these problems in both industry and academia, there is ongoing 
work for the standardization of real time Java specification. 

Current real time Java working groups, Real-Time Java Experts Group and J- 
Consortium, focus only on solving fundamental weaknesses of Java in real-time 
applications as mentioned above. Their objectives do not include practical 
communication middleware architecture for distributed embedded systems. Unlike 
conventional distributed systems, distributed embedded systems are connected by a 
variety of communication technologies such as VME bus, FieldBus, IEEE PI 394, 
ARINC, and FDDI, as well as legacy Ethernet. In such cases, we can not use Java’s 
popular communication model, TCP/IP over Internet, without expensive solutions. 
Also, Java’s communication model does not consider real-time constraints. 

In this paper, we propose Java publish-subscribe middleware that provides 
communication facilities for distributed embedded systems that are connected by 
heterogeneous communication technologies. As the complexity of distributed 
embedded systems increases, and more frequent upgrades and maintenance are 
expected, conventional communication models such as point-to-point and client- 
server model are not adequate to keep up with these trends. So we have built generic 
Java communication facilities based on the publish-subscribe model which has 
become popular in the area of distributed communication, due to its easy adaptation 
to system evolution and scalability. We have also added real-time specific features 
to the publish-subscribe model to guarantee the timing requirements of distributed 
embedded systems. 

The proposed Java publish-subscribe model has been implemented as 
middleware, named Java Embedded Bus (JEB), between the Java virtual machine 
and Java applications. To easily support heterogeneous communication technologies, 
we developed a middleware-independent plug-and-play JEB device driver model. 
The prototype middleware, JEB, is being built on the demonstrative distributed 
embedded system which comprises PowerPC embedded machines and IBM PCs that 
are connected by VME bus and Ethernet. The prototype platforms also include 
VxWorks, Linux, Sun’s JVM, and Personal JWorks. 

The rest of the paper is structured as follows. We describe the design goals in 
section 2. The generic Java communication model is discussed in section 3. We show 
the details of the middleware in section 4. A short conclusion then follows. 



2. Design Goals 
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Our main objective is to purely extend the Java language system for building a 
distributed Java embedded application environment. The detailed design goals are to 
support followings: 

Generic Real Time Communication Model - there are a variety of 
communication requirements in practical distributed embedded systems. They are 
relevant to the complexity and communication behaviors of applications. In simple 
cases, the point-to-point communication model is enough. But, as for safety critical 
battleship control systems that require redundancy based fault tolerance, a fault- 
tolerance supporting communication model is necessary. The model should be 
powerful enough to support these various communication requirements. 

Adherence to Java Programming Style - one of the advantages of using Java is 
its simplicity of programming style with the help of a simple language structure and 
an object oriented paradigm. Therefore, the best approach for Java communication is 
to exchange serializable Java objects themselves. The communication middleware 
should cause a minimum impact on the conventional way of Java programming. 

Real Time Behaviors -Java’s TCP/IP communication model cannot guarantee 
real-time constraints such as message deadline, jitters, etc. In order to guarantee real- 
time constraints of distributed embedded systems, real-time control functions must 
be provided in the middleware level. 

Middleware Independent Plug-and-Play Device Driver Model - to provide the 
same interface to application programmers for different communication 
technologies, a middleware independent plug-and-play device driver model is 
required. Following this model, application programmers can significantly reduce 
their efforts while changing their hardware dependent communication part. 

Scalability and Interoperability - the model should be scalable upon the growth 
of the communication infrastructure. Interoperability among heterogeneous 
communication technology is also important. 

Multicast/Broadcast Support - since a publish-subscribe model has 
characteristics in common with multicast/broadcast protocol by inheritance, it is 
advised to take advantage of lower level support of multicast/broadcast protocol 
while integrating with lower level communication technology. 



3. Java Publish-Subscribe Real-Time Communication Model 

The building of a Java real-time communication model should begin with analyzing 
communication requirements of real distributed embedded systems. In this section, 
we have classified communication requirements of distributed embedded systems 
and have built a Java publish-subscribe real-time communication model which 
fulfills the requirements. 

3.1. Generic Communication Requirement 

We have classified communication requirements of distributed embedded systems 
into categories of connection types, connection modes, and data types. 

The connection types of distributed embedded systems are categorized into the 
four types, according to their relationship between source and destination. They are 
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point-to-point, point-to-multipoint, multipoint-to-point, and multipoint-to- 
multipoint. The latter two types are very important when we build replication-based 
fault tolerant distributed embedded systems. In fault tolerant system, an application 
is usually duplicated in different processors to cope with a failure of the master 
application. 

According to the behavior of a thread (or process) after issuing a send or receive 
primitive, we can distinguish the mode of a connection; either asynchronous or 
synchronous connection. After a sender transmits an asynchronous message, it does 
not wait for either an acknowledgement or a response from the receiver. So the 
sender does not block itself after transmitting an asynchronous message. For a 
synchronous message, a sender has to be blocked after transmission until it receives 
either an acknowledgement or result. 

According to the semantics of messages to be exchanged, we can classify three 
data types, signals, commands/requests, and events. Signals are defined as uni- 
directional data flows that carry continuous information such as sensor data. Signal 
data communication is typically characterized as time-critical, updates are useless if 
old, idempotent, repeated updates are acceptable, last-is-best, latest information is 
more important than retrying missed samples. Regarding commands/requests, 
application cannot miss any intermediate commands or execute a command twice. 
Similarly embedded applications occasionally need to issue specific requests for 
data. Commands/Requests imply a two-way transaction. Events are used to 
synchronize task execution with asynchronous operations. For example, pump tasks 
do not run until the float task indicates the level has fallen below a set value. 

3.2. Real Time Java Publish-Subscribe Model 

Some of the most important objectives while building distributed embedded systems 
are simplicity, evolvability, scalability, and reusability. The conventional client- 
server model is not well suited to these purposes. For example, when the complexity 
of a system is high, there may be no clear decomposition of servers and clients. Also 
dynamic or static addition or removal of applications will be a very difficult job. 

The publish-subscribe model is a very good candidate to overcome these 
problems. In the publish-subscribe model, a sender does not need to know who needs 
its message and a receiver does not care who sends the message. This scheme greatly 
reduces the complexity of communication architecture and makes it easy to upgrade 
and reuse the application components. Therefore, we have adopted a publish- 
subscribe model as a core architecture of our middleware. 

3.2.1. Basic Publish-Subscribe Architecture. Basically, all threads (processes) 
which need to communicate with others must join the publish-subscribe group. Then, 
a publisher which wants to publish its messages has to register its message subject to 
the publish-subscribe group. The subscriber which needs the message should register 
itself to the specific subject group. When a subscriber registers the subject, the 
subscriber is enrolled in the publisher’s destination list. So whenever the publisher 
publishes data, it will be forwarded to the subscriber. The described basic publish- 
subscribe architecture is shown in Figure 1. 
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Figure 1 . Basic Publish-Subscribe Architecture 

3.2.2. Enhanced Publish-Subscribe Architecture. To meet the requirements of 
distributed embedded systems, we have enhanced the basic publish-subscribe model 
by adding several new features as follows. 

3.2.2. L Real-time support. There have been significant research efforts to guarantee 
real-time constraints in priority based scheduling. To take advantage of these 
research works, we also provide a priority based scheduler for publishers and 
subscribers. For a publish-subscribe subject, we allow priorities to be set along the 
route of a message. For example, in a publisher’s processor environment, the 
message will be scheduled based on publisher-priority. On the subscriber’s side, an 
incoming message will be scheduled based on its subscriber-priority. In order to 
enforce the negotiated traffic behavior, the middleware is also equipped with a traffic 
shaping function. 

3.2.2.2. Synchronous communication support. Basic publish-subscribe architecture 
is built on an asynchronous communication model. So the message flow between a 
publisher and a subscriber is uni-directional. But data types such as 
commands/requests require a two-way communication, and a publisher should be 
blocked until it gets either an acknowledgement or a response from the subscriber. 
When it registers a subject in the publish-subscribe group, two corresponding 
subjects are enlisted to accommodate a two-way communication. In this case, the 
middleware should support blocking the current publisher after publishing a message 
until a message is received from the subscriber. 

3.2.2.3. Fault tolerance support. In multipoint-to-point and multipoint-to-multipoint, 
we provide two different semantics for each publisher and subscriber, to support 
diversity of real time applications. In combination, we support four different 
semantics for these two connectivities. Firstly, on the publisher side, multiple 
publishers can be either in master/shadow model or client-server model. In the 
master/shadow model, only one publisher can be an active publisher whose data can 
arrive to registered subscribers. The data from other publishers, shadow publishers, 
are forced not to send to subscribers. This model is very useful in replication based 
fault tolerant systems. Only the active copy of the application can send the data. The 
election scheme such as primary-backup and leader elections are supplied in the 
middleware level to alleviate the burden of programmers. 

3.2.2.4. Reservation based rate control support We also support a rate adjustment 
capability to guarantee timing requirements of both publisher and subscriber. For 
example, the publisher registers that it can publish sensor data at a frequency of 
lOOOHz. And two subscribers need the sensor data at a frequency of lOOOHz and 
lOHz respectively. In this case, the publisher can not be allowed to publish sensor 
data at lOOOHz for both subscribers. The subscriber which expects lOHz sensor data 
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may fail due to too many arrivals of sensor data. The middleware provides 
subscriber initiated negotiation protocol and guarantees the negotiated rate. 



4. Java Real-Time Communication Middieware 

We have implemented the Java publish-subscribe real-time communication model as 
middleware, which is located between the Java virtual machine and Java 
applications. Since the logical behavior of the publish-subscribe model is similar to 
that of a hardware bus, we named the middleware Java Embedded Bus (JEB). The 
JEB is a three-layer architecture, which includes JEB API, JEB Family, and JEB 
device driver from top to bottom. We show the general architecture of the JEB in 
Figure 2. In this section, we describe JEB Family, JEB API and JEB device driver. 
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Figure 2. JEB Middleware Architecture 
4.1. JEB Family 

JEB Family is the core part of the middleware, and provides distributed Java 
applications with real time communication facilities such as routing, control and 
maintenance, and real time scheduling. According to their roles and locations, we 
have devised four components, JEB core, JEB Repeater, JEB Hub, and JEB NS 
(Name Server). All four components are implemented as Java classes which inherit a 
common Java interface “JEB”. We first describe their main functions as follows. 

Routing: Unlike conventional network communication which uses network 
address for routing, the JEB uses application oriented subject for routing a published 
message. The subject name is uniquely assigned for a group of publishers and 
subscribers which exchange messages. 

Control and Maintenance: To provide a dynamic communication environment, 
the JEB supports join and leave protocols for client’s participation and publishing 
and subscribing protocols for subject management. 
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Scheduling: To guarantee real-time constraints imposed to publishers and 
subscribers, the JEB provides priority driven message schedulers in all members of 
JEB family. 

We use a different JEB family module according to the role of the node in global 
JEB distributed embedded systems. The simplest type of module is JEB Core, which 
provides JEB facilities for Java clients, either publisher or subscriber, in the same 
Java virtual machine. When either a publisher or a subscriber needs to connect to 
other clients located in remote systems, it requires JEB Repeater. JEB Repeater is 
allowed to have only one communication media, for example Ethernet or VME. 
With the help of JEB device driver, it provides its clients with JEB publishing and 
subscribing services. JEB Hub is used for communication between nodes that have 
different communication media. For example, when a node with a VME interface 
needs to communicate with a node which has an Ethernet interface, it must be 
assisted by JEB Hub which supports both communication media. The most 
sophisticated module of the JEB family is JEB NS, which includes network and 
subject management functions. JEB NS maintains all publish-subscribe subjects 
registered in the responsible boundary of the distributed embedded system. The JEB 
distributed embedded system is configured as a logical hierarchy tree which includes 
four different types of nodes. 

4.2. JEB Protocol 

In this section, we describe essential JEB protocols such as JEB-boot, join-leave, 
publish-subscribe. 

4.2.1. JEB-boot. The JEB-boot protocol has the responsibility of establishing a 
connection to a parent node in a logical hierarchy tree. When a JEB node boots, it 
reports its own configuration information to a parent node. Eventually this 
information is forwarded to JEB NS. After the network is set up, it will be ready to 
accept any join and leave requests from publishers and subscribers. 

4.2.2. JEB join-leave. JEB allows publishers and subscribers to dynamically join 
and leave the JEB infrastructure. When a class needs to communicate with JEB 
facilities, it has to join the JEB infrastructure using the join protocol. By using the 
join protocol, a class can enroll itself as a member of the JEB infrastructure, and can 
launch the internal JEB event handler to process future events. When a class needs to 
depart from the current JEB group, it executes the leave protocol so that all 
membership related references would be freed. 

4.2.3. JEB publish-subscribe. A publisher can send a Java serializable object to 
interacting subscribers using the publish-subscribe protocol. In JEB, communication 
between a publisher and a subscriber needs two steps. Firstly, a publisher adds the 
subject to the subject lists of JEB using JebRegSubject(). Then a subscriber can 
register itself to that specific subject by JebRegSubscriber(). According to the level 
with which both are joined, corresponding JEB family members will store routing 
information (subject, publisher, subscriber). Secondly, when a publisher sends a 
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message to JEB, JEB’s family will take care of sending the message from the 
publisher to all registered subscribers in the JEB infrastructure. 

4.3. JEB Classes and API 

The JEB middleware is provided in the form of Java class libraries. In this section, 
we show one of the most basic classes of JEB middleware. It is a class “JEB” which 
is used for four members of the JEB family to inherit from. In Table 1, we show the 
basic methods of a class “JEB”. 



Table 1. Basic Methods of a Class “JEB” 



int 


jebJoin (String clientName, Object client) 

attach a client (publisher or subscriber) to JEB infrastructure 


int 


JebLeave (int clientid, String clientName, Object client) 

detach a client (publisher or subscriber) from JEB infrastructure 


int 


JebRegSubject (int clientid. String subjectName) 

registers a subject to JEB infrastructure in default mode 


int 


JebRegSubscriber (int clientid. String subjectName) 

Registers a subscriber to the subject in the JEB infrastructure. 


int 


JebPublishData (int subjid. Object dataltem) 

Publishes dataltem to all subscribers 




JebFetchData (int subjid) 

Fetch the recent Subject Data from JEB 


int 


JebSetClientListener (int subjid, JebClient jc) 

Register client listener to the subject 



// Publisher.java 

import j ava.lang. * ;import j ava.io. * ;import jeb. * ; 
public class Publisher extends Thread implements 
JebClient { 

JebUDPIPDriver jebUdpip = null; 
JebRepeater jebRpt = null; 
public void run (){ 

jebUdpip = new JebUDPIPDriver(); 
jebUdpip. startO; 

jebRpt = new JebRepeater(“publisher”, 
“ns.ufl.edu”, “publisher.ufl.edu” jebUdpip); 
jebRpt.start(); 

int clientid = jebRpt.jebJoin(“publ”, this); 
int subid = jebRpt.jebRegSubject(clientId, 
“integerData”); 

Integer sensor = new lnteger(0); 

While (true) { 

jebRpt.jebPublishData(subId, sensor); 

sensor.value++; 

sleep(lOOO); 

} } } 



// Subscriber.java 

import.java.lang. * ; import java.io. *;import jeb. * ; 
public class Subscriber extends Thread 
implements JebClient { 

JebUDPIPDriver jebUdpip = null; 
JebRepeater jebRpt = null; 
public void run (){ 

jebUdpip = new JebUDPIPDriver(); 
jebUdpip.startO; 

jebRpt = new JebRepeater(“subscriber”, 
“ns.ufl.edu”, “subscriber.ufl.edu” jebUdpip); 
jebRpt.start(); 

int clientid = jebRpt.jebJoin(“subl”, this); 
int subid = jebRpt. jebRegSubscriber 
(clientid, “integerData”); 

Integer Sensor; 

While (true) { 
sensor = (Integer) 

j ebRpt. j ebFetchData(subj Id); 
System.out.println(sensor); 
sleep(lOOO); 

I I I 



Figure 3. Publisher and Subscriber Sample Program 
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In Figure 3, to aid understanding of the middleware, we show two sample 
programs: a Publisher and a Subscriber. Publisher and subscriber are connected by 
UDP/IP JEB device driver. 

4.4. JEB Device Driver Modei 

The JEB device driver is responsible for connecting JEB middleware and physical 
communication media. Due to the diversity of physical communication media, we 
need a middleware independent plug-and-play device driver model. With this 
concept, it is very easy to change the physical communication media while most of 
the application code is not touched. All that is required is for application 
programmers to select the proper JEB device driver and provide configuration 
information. 

4.4.1. JEB Device Driver Architecture. We show the abstract architecture of the 
JEB device driver model and relationship with the JEB middleware in Figure 4. In 
the device-driver layer, we use network address for routing messages and byte- 
streamed Java serialized object for message encoding. In the middleware level, 
subject and Java object are used correspondingly. The JEB device driver should 
implement JEB channel management, a channel scheduler, and marshal/immarshal 
functions. It should also implement four standard methods of JEB device driver 
interface (API) as shown in Table 2. 




Figure 4. JEB Device Driver Architecture 

In the case of message reception, the device driver callbacks the jebOnReceiving 
method of the JebRXListener interface, which is implemented in every JEB family 
member. The method stores a received JEB message in the JEB receiving priority 
queue. In the case of message transmission, the middleware calls the jebDevSend 
method which stores a JEB message in the transmission-priority queue in the driver. 
The TX thread will schedule messages and transmit them. 
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4.4.2. Device Driver Interface. The only requirement for the JEB device-driver 
developer is to implement the four interface methods shown in Table 2. The 
configuration information can be supplied to the Java constructor of the 
corresponding device-driver class. For example, in the case of the UDP/IP JEB 
device driver, JebUDPIPDriver, the constructor is provided with the IP addresses of 
the local host and a parent host. The port number is fixed for all JEB UDP/IP device 
drivers. Using the model, we have prototyped two types of JEB device driver for 
IP/Ethemet and VME Bus. In the case of IP/Ethemet, both TCP and UDP are used 
for separate device drivers. While the TCP/IP driver can be used for more reliable 
distributed systems, UDP/IP can be used for less reliable but light-weight system. 

Table 2. JEB Device Driver Interface 



public void 


JebDevInitO 

Initialize JEB Device Driver 


public void 


JebDevStartO 
Start JEB Device Driver 


public void 


JebDevSetListener (JebRXListener jrl) 

Register RX Message Listener (JEB) 


public void 


JebDevSend(JebMessage jmsg, Vector dest, int priority) 

Send a JEB message 



5. Conclusion 

To provide communication facilities for Java based distributed embedded systems, 
we have developed Java publish-subscribe middleware. The middleware supports 
preserving the Java programming style, real time behaviors, variety of 
communication technologies via the standard device driver model, scalability and 
interoperability, and exploiting lower level multicast/broadcast. As the Java 
technology becomes popular in real time systems along with coming real time Java 
standards, Java publish-subscribe middleware will take an important role in 
designing complex, evolvable, integrated, and distributed embedded systems in an 
efficient way. 

Our further research area is to support distributed embedded systems which use 
different languages, such as Ada, C, and C-H-. Although there is no doubt that Java 
will be a popular language in real time applications in the near future, we do not 
expect that all systems will be written in Java only. So it is important to support 
different language systems. We consider using either XML or CORBA for 
exchanging objects between different language systems. 



References 

G. Hilderink, J. Broenink, and A. Bakkers (1998) “A new Java Thread Model for Concurrent 
Programming of Real-time Systems.” Real-Time Magazine. 







Java Real-Time Publish-Subscribe Middleware 



203 



G. Pardo-Castellote, S. Schneider, M. Hamilton. (1997) “NDDS: the Real-Tiem Publish- 
Subscribe Network.” IEEE workshop on Middleware for Distributed Real-Time Systems 
and Services. 

J-Consortium. (1999) Draft International J Consortium Specification. 

L. Carnahan, M. Ruark. (1999) Requirements for Real-time Extensions for the Java Platform, 
National Institute of Standards and Technology. 

M. Colan. (1999) InfoBus 1.2 Specification, Sun Microsystems. 

M. Hapner, R. Buffidge, R. Sharma. (1999) Java Message Service Specification, Sun 
Microsystems. 

M. Swick, J. White, and M. Masters. (1998) “A Summary of Communication Middleware 
Requirements for Advanced Shipboard Computing Systems.” IEEE Real-Time Technology 
and Applications Symposium. 

The Real Time for Java Experts Group. (2000), Real Time Specification for Java, Sun 
Microsystems. 




A VERIFIED HARDWARE SYNTHESIS 
OF ESTEREL PROGRAMS 

Klaus Schneider 



Institute for Computer Design and Fault Tolerance (Prof. Dr.-Ing. D. Schmid) 
University of Karlsruhe, P.O. Box 6980, 76128 Karlsruhe 
Klaus.SchneiderQinformatik.uni-karlsruhe.de 
http://goethe.ira.uka.de/~schneider 



Synchronous programming languages like Esterel are 
becoming more and more popular for the design of 
multi-threaded reactive systems. We have embedded 
a variant of the Esterel language in the interactive 
theorem prover HOL so that we can formally reason 
about programs of the language and -at a meta level 
- about the language itself Based on a separation of 
the control and data flow of the programs, we have 
defined a new translation to equation systems. Our 
new translation is simpler than state-of-the-art trans- 
lations, and it does not suffer from the schizophre- 
nia problems of parallel statements. Furthermore, we 
have proved the correctness of our translation with 
HOL, so that HOL can be used for formal synthesis. 

1 INTRODUCTION 

Synchronous languages are becoming more and more attractive [1, 2, 3, 4, 5] for the 
design and the formal verification of parallel reactive real-time systems. There are 
imperative languages like Esterel [6, 7, 1], data flow languages [9], and graphical 
languages like some Statechart variants [10, 11]. We concentrate in this paper on 
imperative languages, but emphasize that graphical and imperative languages can be 
naturally translated into each other [11]. 

The basic paradigm of (imperative) synchronous languages is the perfect synchrony, 
which means that most of the statements are executed in zero time (at least in the ide- 
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alized programmer’s model). Consumption of time must be explicitly programmed 
with special statements, as e.g. the pause statement in Esterel. Each execution of a 
pause statement consumes one logical unit of time, and therefore separates different 
interactions. As the pause statement is the only basic statement that consumes time, 
it follows that all threads of a synchronous program run in lockstep: they execute the 
code between pause statements in zero time, and synchronize at the next pause state- 
ments. Note that this synchronization is simply due to the semantics of the language. 

The control flow of a synchronous program V can therefore be compiled into a 
finite state machine Av in that we describe how the control flow moves from a set 
of currently active pause statements to the set of pause statements that are active at 
the next point of time. Of course, we must also consider the data flow of a program, 
i.e. how the transition of the control flow manipulates the data values of the program. 
Therefore, we can model any Esterel program by a finite state control flow that manip- 
ulates possibly infinite data types. The control and data flow can be described in form 
of equation systems that can be furthermore converted into a sequential (i.e. single- 
threaded) imperative program, as e.g. a C or Java program [1, 4, 5] or to a VHDL 
program to synthesize a hardware circuit. Therefore, Esterel programs can be both 
used for hardware or software generation. 

The translation of synchronous programs to the corresponding equation system is 
an essential means for code generation and formal verification. Therefore, a lot of 
ways have been studied for this translation: [12] distinguishes between a process- 
algebraic, a finite-state machine, and a hardware circuit semantics. The process- 
algebraic and the finite state machine semantics are used to enumerate the control 
states of a program by a depth first traversal so that the control flow state machine is 
explicitly constructed. Therefore, these translations suffer from the drawback that a 
program of length n may have 0(n) pause statements and therefore states. 

Newer versions of Esterel compilers work more efficiently [12]: They translate 
a program of length n in polynomial time to an equivalent equation system that is 
expressed in terms of hardware circuits. Hence, one often speaks of a ‘hardware syn- 
thesis’, although this representation is used for software synthesis as well. Despite 
the fact that these equations might still define 0(2"^) states, the translation can be 
performed in polynomial time, since the equation system make use of a symbolic rep- 
resentation [13]. 

However, the circuit semantics as given as the ‘basic translation’ in [12] suffers 
from so-called schizophrenia problems that arise when a statement is terminated and 
restarted at the same point of time. This requires the reincarnation of local signals that 
then appear with different values at the same point of time. Moreover, circuit parts 
used to implement parallel statements (the synchronizer circuits) are erroneously used 
for the old and new incarnations of the loop body. Therefore, [12] suggests a more 
complicated hardware synthesis to overcome these problems. 

Instead, we have defined a new ‘basic translation’ to equation systems to circum- 
vent the schizophrenia problems of the control flow, i.e., for parallel statements. How- 
ever, we must still handle schizophrenia problems for the data flow, i.e., for local 
signals. We present our translation in form of a hardware synthesis in section 3 for 
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our language Quartz. Quartz extends Esterel by several statements used to explic- 
itly program nondeterminism and asynchronous parallel execution to model reactive 
systems. As these additional statements can not be simply translated to deterministic 
synchronous hardware circuits, we do not consider them in this paper. However, we 
consider the additional delayed data manipulating statements of Quartz. 

To assure the correctness of our translation, we have embedded Quartz in the inter- 
active theorem prover HOL [14], so that Quartz programs have become part of HOL’s 
higher order logic. We have then defined the hardware synthesis, and have proved its 
correctness, which required to prove a couple of lemmata in advance. Based on the 
correctness theorem, we can now use the HOL theorem prover to implement a formal 
synthesis tool: the translation of a program can be done by HOL, where a correctness 
proof is additionally generated for the particular program. Furthermore, this can be 
performed very efficiently: the formal synthesis can be done in polynomial time, and 
our experimental results showed that it can even compete with standard compilers. 

2 SYNTAX AND INFORMAL SEMANTICS 

Synchronous languages like Esterel or Quartz are mainly concerned with the imple- 
mentation of the complex control flow of threads while data types and expressions may 
be borrowed from a host language. Hence, we do neither consider types nor expres- 
sions in the following. Our embedding of Quartz in HOL does also directly borrow 
types and expressions from the HOL logic. 

As time is modeled by the natural numbers N, the semantics of an expression is a 
function of type IN ^ a for some type a. In Quartz, we distinguish between event and 
valued signals. The semantics of an event signal is a function of type IN — ^ B, while 
the semantics of a valued signal may have the more general type IN a. Valued 
signals are ‘sticky’: they store their value until a data operation is applied. Event 
signals, on the other hand, are not sticky: if they are not explicitly made present at the 
next point of time, they will be reset to 0 (we denote boolean values as 1 and 0 ). 

The basic statements of Quartz are given below: 

Definition 1 (Basic Statements) The following rules define the set of basic state- 
ments o/Quartz, where it is assumed that S, S\, and S2 are also Quartz statements, i 
is a label variable, x an event signal, a a boolean expression, and y a valued signal: 

■ nothing ( empty statement) 

■ ^ : pause (consumption of one logical unit of time) 

■ emit X and emit delayed x (signal emissions) 

■ y := T and y := delayed r (assignments) 

■ if (j then S\ else S2 end (conditional) 

■ Si; S2 (sequential composition) 

■ S\ \\ S2 (synchronous parallel composition) 

■ while a do S end ( iteration) 

■ suspend S when a (suspension) 

■ weak suspend S when a ( weak suspension) 
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■ abort S when cr (abortion) 

■ weak abort S when a ( weak abortion) 

Before giving a precise formal semantics in terms of our new hardware synthesis, we 
informally discuss the meaning of the above statements (for further explanations and 
examples, we refer to [7]). In general, a statement S is started at a certain point of 
time fi, and may terminate at time t 2 > fi, but it may also never terminate. If S im- 
mediately terminates when it is started (^2 = ^i), it is called instantaneous, otherwise 
we say that the execution of S takes time, or simply that S consumes time. 

pause is the only basic statement that consumes time. The statement does not affect 
any data values. Each pause statement of a program is endowed with a unique location 
variable i that we will later use to encode the control flow of the programs, nothing is 
an empty statement: it simply does nothing, i.e. it does neither consume time, nor does 
it affect any data values, emit x immediately makes the event signal x present, i.e., 
the value of x at that point of time is then 1. Executing y,=r will immediately change 
the value of y to the current value of the expression r. The statements emit delayed x 
and y := delayed r are similarly defined as emit x and y := r, respectively, but with 
a delay of one unit of time. In the latter statement, r is evaluated at the current point 
of time, and its value is passed to y at the next point of time. We emphasize that none 
of these data manipulating statements consumes time, but they may affect the signal 
values at the next point of time. 

if a then Si else 52 end is the conditional statement that checks whether the ex- 
pression a currently evaluates to 1 or 0 and then immediately either executes Si or S 2 
(depending on the value of a). 5i; 52 is the sequential execution of Si and 52, i.e. 
we first enter Si and execute it. If Si never terminates, then S 2 is never executed at 
all. If, on the other hand Si terminates, we immediately start S 2 and proceed with the 
execution of 52- 

Si II S 2 denotes the synchronous parallel execution of Si and 52: if Si || 52 is 
entered, we both enter Si and S 2 and inunediately proceed with both executions. As 
long as both Si and S 2 are active, both threads are concurrently executed in lockstep. 
If Si terminates, but S 2 does not, then Si || S 2 behaves further as S 2 does (and vice 
versa). If finally S 2 terminates, then so does Si || 52. 

while cr do 5 end implements iteration: if this statement is entered, two cases are 
to be distinguished: If a does not hold, then the statement instantaneously terminates. 
Otherwise, we immediately execute 5. It is possible that 5 never terminates. However, 
if 5 terminates, and at that point of time a holds again, then 5 is immediately restarted. 

(weak) suspend 5 when a implements process suspension, i.e. 5 is entered when 
the execution of this statement starts, regardless of the current value of cr. For the fol- 
lowing points of time, the execution of 5 only proceeds if cr evaluates to 0, otherwise 
its execution is suspended until cr allows a further execution. 

(weak) abort 5 when a implements process abortion: 5 is immediately entered, 
regardless of the current value of cr. 5 is then executed as long as a is 0. If a becomes 1 
during the execution of 5, then 5 is immediately aborted. Hence, abort 5 when cr can 
either ‘normally’ terminate (when the execution of 5 terminates), or it may terminate 
by abortion (when cr enforces this). 
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The ‘weak’ variants of process suspension and abortion differ only on the treatment 
of the data manipulations at suspension or abortion time: while the strong variants 
ignore all data manipulation, all of them take place in the weak variants. There are 
also immediate variants of suspension and abortion that do also consider the value of 
the condition <r at starting time. These can be defined in terms of the other statements. 
There are also a lot of other statements that can be defined as macro expansions of 
basic statements. 

Esterel has the same basic statements as the ones above. Esterel does also have 
event signals, as defined here. Valued signals of Quartz correspond to Esterel ’s valued 
signals, but we omit the status of these signals, as the status can be implemented by 
additional event signals, if wanted. Finally, Esterel has variables, which can take more 
than one value at a point of time. We do not consider these variables in this paper, 
although they do appear in Quartz as well. 

3 TRANSLATING Qu rtz PROGRAMS TO EQUATION SYSTEMS 

In this section, we present our new translation of Quartz and Esterel programs to 
equivalent equation systems. Similar to the state-of-the-art translation [12], our new 
translation is based on a recursive translation, where each program statement is imple- 
mented by a hardware circuit template. In contrast to [12], we do however distinguish 
between the control and the data flow of the program. Moreover, the hardware circuits 
we use have different inputs and outputs, that allow a more efficient translation. In 
particular, we do not need synchronizer circuits for the translation of parallel state- 
ments, and hence, our translation does not suffer from the subsequent schizophrenia 
problems. In the following three subsections, we first define the control flow, then the 
data flow, and finally combine both to a single equation system. 

3. 1 The Control Flow 

In this section, we define the first part of the equation system that describes the control 
flow of the program. It is convenient to describe the recursive computation of this 
equation system by means of hardware circuit templates as listed in Figure 1 . It is 
straightforward to extract from the circuit netlist the transition functions of the flipflops 
which then form our equation system for the control flow. 

The circuits used in Figure 1 have boolean valued inputs start, susp, and kill, 
and boolean valued outputs inst, insd and term. The event and valued signals that 
occur in the program are collected in the additional input E (the environment). As the 
circuits given in Figure 1 do only compute the control flow, they will only read the 
values of E, but do not manipulate them (this is the task of the data flow). 

The meaning of the other inputs and outputs is the following: start is used to start 
the execution of the program implemented by the circuit, susp is used to suspend 
the current computation, i.e. susp simply ‘freezes’ all flipflops of the circuit. The 
kill input is used to abort the current computation in that it simply resets all flipflops 
of the circuit, inst signals that the circuit is instantaneous, i.e., inst holds iff the 
computation would immediately terminate when it would be started with the current 
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Figure 1 Hardware Circuits to Compute the Control Flow 
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environment E. insd is the disjunction of all flipflops of the circuit, thus meaning that 
the control flow is currently somewhere inside the circuit. Finally, the output term 
indicates that the current computation will now terminate (but the control flow is still 
in the circuit). If term holds, and start does not hold, the control flow will leave the 
circuit, so that insd will be false at the next point of time. To avoid confusion, we 
denote in the following the inst, insd and term outputs of the circuit of a statement 
S as inst (S), insd (5), and term (5). 

We have formally proved with the theorem prover HOL [14] that the recursive 
hardware synthesis rules as given in Figure 1 do correctly implement the control flow 
of a program. To be precise, the result is the following theorem (as S is equivalent 
to suspend abort S when 0 when 0, we can instantiate kill := susp := 0 to really 
compute the control flow of S): 

Theorem 1 (Correctness of the Control Flow Computation) For any Quartz state- 
ment S, the hardware circuit C{S) as generated by the rules given in Figure 1 im- 
plements the control flow of the statement suspend abort 5 when kill when susp, 
provided that the following constraints do always hold: 

■ ^{kill A susp) 

■ insd A ->kill A [-^term V susp) — > ->start 

■ For any loop body B of a substatement while a do end of S, the condition 

term {B) Act -linst (B) must always hold. 

The first constraint means that circuits should not be both suspended or killed at any 
point of time. The second constraint roughly means that we must not start circuits that 
are already active (insd), unless they are not aborted (-^kill) or terminate (term) at 
that point of time. The third constraint means that loop bodies must not be instan- 
taneous. In fact, our constraint is a bit weaker, in that it allows loop bodies to be 
instantaneous if either a does not hold, or the loop body does currently not terminate. 

It is easily seen by the rules of Figure 1 , that pause statements correspond with 
flipflops of the circuit. Given that £i, . . . , are the label variables of the pause 
statements that occur in a statement S, we can therefore derive an equation system 
of the form {init(^i) := 0 | 1 < i < p} U {next(£i) := | 1 < z < p} from 

the circuit. The equations init(£i) := 0 thereby determine the initial state, and the 
equations next(^i) := determine all transitions of the control flow. To distinguish 
the starting state from a possible termination state, we furthermore use an additional 
state variable £q (the boot location) with the equations init(£o) 1 and next(£o) ^= 0 . 
We finally equate the start input with io and the kill and susp inputs with 0. 

3.2 Defining the Data Flow 

We will now define the data flow part of the equation system. This is based on the 
set of guarded commands of S. In general, a guarded command is of the form ( 7 , c), 
where 7 is a condition and c is a data manipulating statements, i.e., a statement of 
one of the following types: emit x, emit delayed x, y := r, or y := delayed r. The 
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intuition behind a guarded command ( 7 , c) is that whenever the guard condition 7 is 
satisfied, then the command c must be immediately executed. Guarded commands 
may themselves be viewed as a programming language (like Unity [15]) when we as- 
sume that each guarded command is a separate process. The set of guarded commands 
of a statement is computed as follows: 

Definition 2 (Guarded Commands of Statements) Given any Quartz statement S, 
we define the guarded commands guardcmd ((^, S) of S wrt. the initial condition ip 
as: 



■ guardcmd {p, nothing) = {} 

■ guardcmd (</?, ^ : pause) = {} 

■ guardcmd [<p, emit x) = {(y?, emit x)} 

■ guardcmd {p, emit delayed x) = { {p, emit delayed x} 

■ guardcmd (<^,x := r) = {((^,x := r)} 

■ guardcmd {p, x := delayed r) = {{p^x := delayed r)} 

■ guardcmd {p, if a then Si else S2 end) 

= guardcmd {p A a, Si) U guardcmd {p A -icr, S2) 

■ guardcmd (p,Si;S2) 

= guardcmd (</?, 5 i) U guardcmd (inst ( 5 i) ApW term ( 5 i) , S2) 

■ guardcmd {p, Si || ^2) = guardcmd {p, Si) U guardcmd {p, S2) 

■ guardcmd (p, while a do 5 end) = guardcmd {{p V term ( 5 )) A cr, S) 

■ guardcmd (p, suspend S when a) 

= {(7 A (insd (S) -icr), a) | (7, a) 6 guardcmd {p, 5 )} 

■ guardcmd {p, weak suspend S when cr) = guardcmd (p, S) 

■ guardcmd (p, abort S when a) 

= {(7 A (insd ( 5 ) — ► --cr), a) | (7, a) e guardcmd (p, 5 )} 

■ guardcmd {p, weak abort S when a) = guardcmd {p, S) 

The above definition should be clear, we only explain the case for a sequence 5 i; 52 . 
The first part Si, namely that guardcmd ((/?, 5 i; 52) contains guardcmd (</?, 5 i) is 
clear. For the second part, we have to distinguish between two cases: On the one 
hand Si may be instantaneous, so that the last location was the one described by p. 
Hence, we have to compute guardcmd (inst ( 5 i) A p, 52). On the other hand. Si may 
not be instantaneous. In this case, we have to compute the last location inside Si 
where the control flow has been before leaving Si. As this is encoded in term ( 5 i), 
we simply have to add guardcmd (term ( 5 i) , 52). 

Note that the weak and strong variants of suspension and abortion differ in that the 
strong variants replace the guards 7 of 5 by 7 A (insd ( 5 ) ->a), so that no data 

manipulation takes place at abortion/suspension time. 

For the definition of the data flow, we have to take into account that event and 
valued signals are handled differently. Remember that the values of event signals 
must be computed anew at each point of time, whereas the values of valued signals 
are stored unless an assignment changes them. A further problem of Quartz is here 
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that we must also cope with delayed emissions and assignments. Hence, we give the 
following definition (that does however only hold in case we have no write conflicts). 



Definition 3 (Data Flow of Statements) Assume the guarded commands (ai, emit x), 
. . . , {am, emit x) and {Pi, emit delayed x), . . . , (/?„, emit delayed x) are the only 
emissions of the event signal x in a statement S for the initial condition ^o- Then, we 
define: 



■ Idf (4, X, S) = {init(x) := VIli gh) 

■ 7^df (4, X, S) = {next(x) := (VILi A) V next (VIli o^i)} 

Further, assume the guarded commands {a\^y := rf), ..., {am,y •= Xm) and 
{Pi,y := delayed 7Ti), . . {Pn,y := delayed tTh) are the only assignments to the 
valued signal y in a statement S for the initial condition 4- Then, we define: 





r 


/ if ai then ri \ 


■ Xdf(4,2/,*S') == < 


init(?/) := 


elsif am then 






\ else ‘someinitial-value’ / 
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/ if next (ai) then next (ti) \ 






elsif next (q 2) then next (T2) 




next(y) := 


elsif next {am) then next (r^) 
elsif Pi then tti 






elsif Pn then tt^ 




< 


\ elsey / 





Given a statement S with the outputs x\, . . ., Xm, define the data flow of S as the 
following equation system: 



■ 2df (4, S) = Xdf (4, Xi, S) 

■ '^df (4, S) = Ur=l '^df (4, Xi,S) 



3.3 Combining Control and Data Flow 

Having defined the control flow and the data flow equation systems, it is now easy to 
combine both to obtain a complete description of the semantics of a statement. This is 
simply defined as given below: 

Definition 4 (Equation System of a Statement) Given a statement S, with an initial 
location Then, we define the equation system for 4 and S as follows: 

m J(4,5)=Icf(4,5)UJdf(4,5) 

■ 7^(4, S) = T^cf (4, S) u 7^df(4, S) 
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4 APPLICATIONS AND SUMMARY 

To summarize, we have developed a new translation of Esterel/Quartz programs into 
equivalent equation systems. These equivalent equation systems can be used for hard- 
ware and software synthesis, but also for a formal verification, which is our main fo- 
cus. In particular, the translation to equation systems offers beneath the use of model 
checking techniques [13] also the use of term rewrite techniques as available in some 
theorem provers like HOL. Our translation does not suffer from schizophrenia prob- 
lems in the control flow (schizophrenic synchronizers) [12]. The reason for this is that 
our circuit templates trigger themselves, i.e., we do not need an additional ‘resume’ 
input. This makes the entire hardware synthesis clearer and even more efficient: a 
simple comparison shows that our circuit templates require less gates than previous 
translations [12]. We have moreover proved the correctness of our translation with 
the HOL theorem prover, so that the theorem prover can even be used to implement a 
formal synthesis tool. 

References 

[1] Esterel Web. http://www.esterel.org. 

[2] Simulog . http://www.simulog.fr. 

[3] Cadence Design Systems,Inc. http://www.cadence.com. 

[4] ECL Homepage, http://www-cad.eecs.berkeley.edu/ 

[5] Jester Homepage, http://www.parades.rm.cnr.it/projects/jester/jester.html. 

[6] G. Berry. The foundations of Esterel. In G. Plotkin, C. Stirling, and M. Tofte, 
editors. Proof, Language and Interaction: Essays in Honour of Robin Milner. 
MIT Press, 1998. 

[7] G. Beriy. The Esterel v5_91 language primer, http://www.esterel.org, June 2000. 

[8] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous dataflow 
programming language LUSTRE. Proc. of the IEEE, 79(9): 1305-1320, 1991. 

[9] P. Le Guemic, T. Gauthier, M. Le Borgne, and C. Le Maire. Programming real- 
time applications with SIGNAL. IEEE, 79(9):1321-1336, 1991. 

[10] D. Hard. Statecharts: A visual formalism for complex systems. Science of 
Computing, pp. 231-274, 1987. 

[11] Ch. Andre. Synccharts: A visual representation of reactive behaviors, research 
report tr95-52. University of Nice, Sophia Antipolis, 1995. 

[12] G. Berry. The constructive semantics of pure Esterel, July 1999. 

[13] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang. Symbolic 
Model Checking: 10^® States and Beyond. IEEE Symposium on Logic in Com- 
puter Science, pp. 1-33, Washington, June 1990. IEEE Computer Society Press. 

[14] M.J.C. Gordon and T.F. Melham. Introduction to HOL: A Theorem Proving 
Environment for Higher Order Logic. Cambridge University Press, 1993. 

[15] K.M. Chandry and J. Misra. Parallel Program Design: A Foundation. Addison- 
Wesley, 1988. 




EXPLORA - GENERIC DESIGN SPACE 
EXPLORATION DURING EMBEDDED 

SYSTEM SYNTHESIS 

Frank Cieslok, Heinrich Esau, Jurgen Teich 

Computer Engineering Lab (DATE), University of Paderborn 
Warburger StraBe 100, 33098 Paderborn, Germany 
email: teich@date.upb.de 



The need for design space exploration on different levels of 
abstraction during synthesis of electronic systems has received 
wide attention recently. Unfortunately, there are almost no tools 
available on the EDA market that allow a designer to enhance his 
synthesis tool suite by design space exploration capabilities 
easily. A versatile tool for design space exploration must be 
targetable to different synthesis tools. Also, different optimization 
(exploration) algorithms should be able to be connected to such a 
versatile tool. Here, we present an approach that enables design 
space exploration with support to couple different exploration 
algorithms and synthesis tools. Our JAVA based tool called 
EXPLORA is also able to visualize the exploration results and can 
be adapted to new problems and abstraction levels within hours. 



1. Introduction 

The goal of this work is to design a tool that enables the design space exploration 
during the design of embedded systems, i.e., during hardware/software codesign. 
Only recently has this problem been considered as a need in order to estimate the 
quality of different design alternatives in early design phases and propose the 
designer a number of optimal solutions with different characteristics (e.g., speed 
versus cost, power, etc.) from which he/she may decide to choose and refine one 
solution to the final product. In this area, multiobjective optimization problems have 
to be solved. An approach that uses evolutionary algorithms for design exploration of 
Pareto-optimal fronts can be found in [1] for system-level synthesis (a generalization 
of hardware/software partitioning), or for optimal code synthesis for DSP processors 
from data flow graphs [4], [6], [5]. 

Unfortunately, existing tools are either too specialized to be used for different 
synthesis problems, too tightly coupled to tools that are used for evaluating the 
quality of a design point, heavily dependent on architecture assumptions, design 
abstraction, etc. such that a reuse of such tools is simply impossible. Here, we 




216 



Architecture and Design of Distributed Embedded Systems 



present the structure of a versatile tool for design space exploration called 
EXPLORA that may be easily incorporated into any level of design abstraction. The 
flexibility results by addressing the following problems and requirements: 

• Formal (functional) quantification of the nature of design space exploration 
processes involving synthesis tasks. 

• Clear separation between 

Problem-specific parameters (e.g., dimension of exploration space, 
metrics, cost function, etc.) 

Independence of synthesis algorithm and implementation language 
Independence of optimization (exploration) algorithm and 
implementation language 
Visualization support 

• Finally, it should be easy to couple such a design space exploration tool to 
existing environments. 

First, we give a characterization of design space exploration processes during 
embedded system synthesis. In Section 3, we present the mathematical background 
to formalize the process of generic design space exploration. In Section 4, the 
structure of EXPLORA is introduced. Finally, in Section 5, we present a typical 
scenario for coupling the well-known Synopsys behavioral compiler, see, e.g., [3], for 
design space exploration during high-level synthesis as a case study. 



2. Characterization of Design Space Expioration Processes 

2 . 1 Structure of design space exploration processes 

Two of the basic requirements of a flexible tool for design space exploration are a) 
the exchangeability of the optimization algorithm that is used for exploration of the 
design space and b) its adaptability to different synthesis tools that may be used to 
compute the quality of points in the design space concerning cost, speed, and other 
metrics. A natural distinction is to split the process of design space exploration into 
three main tasks: The first one contains all synthesis tool specific behavior, the 
second concerns the optimization algorithm for evaluating solutions and selecting 
new design points in the design space with the goal to obtain a high diversity 
(covering) of optimal points. The third module manages the exploration process 
itself using handles to the other modules. 

Since the optimization algorithm needs a cost function which rates a given result 
produced by the synthesis tool, it is useful to split the module with the optimization 
algorithm into a second module for computing the cost function (see Fig. 1). This 
way, the cost function can easily be changed by the user, too, without the need to 
exchange the optimization algorithm. 

Figure 1 shows the structure of a tool for generic design space exploration. The 
exploration manager starts a synthesis tool with certain parameters and obtains the 
synthesis results of this tool. This result is forwarded then to the optimization 
algorithm that is used during the exploration. This algorithm provides the next 
parameter set(s) (new design point(s)) to explore. The optimization module in turn 
uses a cost function which rates a given result. Finally, it is also desirable to have a 
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graphical user interface (GUI) that gathers and visualizes the progress and results 
during the exploration process. 

This section deals with the description of the modules in Fig.l. We will describe 
their behavior using functional abstractions. First, some explanations are in order. 

3 . 1 Explanations 

Before describing the behavior of the exploration program modules, it must be 
specified which kind of data is exchanged. Figure 1 shows the three different data 
structures parameters, result and costs. The parameters-object characterizes a 
design point and can be described by a set of parameters needed by the synthesis tool 

to perform a synthesis. 

[EXPLORA ; 




Figure 1: Structure of a program for design space exploration 

A result-object represents that part of the output of a synthesis tool which is 
important to control the exploration flow and which is needed by the user to rate this 
result. Basically, it consists of a set of quantities representing the properties of the 
synthesis tool output. 

Example 3 . 1 

Consider the abstraction-level of high-level architectural synthesis, for instance, and 
the Synopsys Behavioral Compiler [3]. Among others, this tool has the option 
parameter io_mode with the possible values cycle_fixed, free_float and 
superstate.^ It further needs a file with the behavioral description of a certain 
design, e.g., a VHDL or Verilog file. The filename is interpreted as a parameter, too, 
so the 



^ The values of this parameter denote whether I/O-operations of a given design must be scheduled in 
certain fixed cycles, in arbitrary cycles or respecting a certain partial order, respectively. 
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Behavioral Compiler should produce a synthesized data path plus controller when 
invoking it with these two parameters and proper values. In order to evaluate the 
quality of such a synthesized design, the Behavioral Compiler or any synthesis tool 
typically outputs some result and log files, and special postprocessing steps are 
usually needed in order to extract the important quality metrics needed for design 
evaluation such as the area requirement of the synthesized design, its required 
latency, minimum possible clock cycle time, etc. These are typical synthesis results 
needed for rating the quality of a design during design space exploration. Each 
parameter set producing a different design is called a design point. 

The COStS-object may often be simply described by a tuple of real numbers 
which must be properly interpreted by the optimization algorithm to rate a certain 
result. With these explanations, the task of each exploration program module can be 
defined using functional abstractions of each module. 

3.2 Tool starter module abstraction 

Each synthesis tool needs some input files to perform a synthesis and produces some 
output files as a result. During the exploration, possibly not the whole input should 
be modified (e.g., the design specification stays the same) and not all parts of the 
output (e.g., command log files) should be taken into consideration. So the task of 
the tool starter in Fig. 1 is to produce the complete input needed by the synthesis tool 
given a parameter set and to extract the desired result quantities from the output 
returned by the synthesis tool after its completion. The tool starter sits on top of the 
corresponding synthesis tool and behaves like an independent tool to its invoker, so 
its behavior can be described by a function as follows: 

Let a certain synthesis tool starter have « parameters with the domain 

for the parameter p^J = 1,..., n . Hence, a design point may also be characterized by 
an « -tuple p without loss of generality. Let the corresponding tool produce m 
result quantities that may be represented by a tuple q. Let be the 

domain of the quantity = . If there are no other constraints, then 

P - P^xP 2 x...xP„ is called the design space result 

space. The behavior of this synthesis tool starter can be described by a function 

synth:P-^Q (1) 

Example 3.2 

Let io_mode and vhdLfile in the previous example be the parameters of the 
Synopsys Behavioral Compiler, the tool starter would expand this parameter set and 
extract proper values for the area and latency time of a synthesized design 
afterwards. Suppose the name of the VHDL file is design.vhdl, the domain of the 

function synth would be P = {design.vhdl}x{cycle_ fixed, superstate, free _ float} 
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and the range of result values g = x . For each valid VHDL file and for 

each value of io_mode the function synth would produce a pair of values for the 
area and the latency time. 

3.3 Cost function abstraction 

For rating a synthesis result, the optimization algorithm needs a cost function which 
builds a tuple of costs from a tuple of result quantities. Let be / cost 

quantities. Let Cy be the value range of cost parameter CjJ = 1,...,/ . The range of 

values of the resulting cost tuple c is C,C = XC 2 x... xC^ . The cost function is 
then cost: Q C 

(2) 

Example 3.3 

Continuing the previous example, one cost function would be the weighted one 
dimensional function (/ = 1,/w = 2 ) 

cost{area{p)Jatency{p)) = 0.7 * area{p) + 0.3 * latency(p) . 

This way, the cost function would weight the area of a design point p more than its 
latency time and thereby force the exploration in a direction which rather would 
produce smaller than faster designs. 

Example 3.4 

Let us consider, without loss of generality, a multi-objective minimization problem 
with m result parameters for each design point p of dimension n and / objectives. 
Let q - synth{p) . Minimize c = cost(q) = (cost(q),...,cost{q)) 

(3) 

where q = {q^ ,..., q^)e Q and c = ,..., c^e C are tuples with 

Cj - cost^{q)J = 1,...,/. de Q is said to dominate b s Q (also written as a>b) 
iff V/G { 1 ,...,/}: cost -{a) < cost^(b) a 

(4) 3je ^,...,l]\cost j{d) < cost j{b) 

a covers b (a '^) iff d)^b or cost{d) = cost(b) . All design points p^ G P with 
the property that q^ = synth{p^) is not dominated by any other qj = synth(pj), 

p j e P , are called nondominated. Pareto-optimal points are the nondominated 

design points of the entire search space P . 

For design space exploration, a useful cost function is to let cost(x) be equal to 
the number of design points explored so far that dominate x. Hence, after the 
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exploration, all explored points with cost zero are (approximations of) Pareto- 
optimal points. 

3.4 Optimization module abstraction 

The task of the optimization module is to produce a set of new parameter sets (design 
points) to explore next given a set of design points and their ratings. 

If a given optimization algorithm needs v different synthesis results to generate co 
new design points, then the behavior of the optimization module could be described 

by a function opt : {PxC)^ -> 

( 5 ) 

Exampie 3.5 

Consider an optimization (exploration) algorithm that is population-based, e.g., a 
variant of an Evolutionary Algorithm that simply selects the best result from n given 
result objects using a certain cost function and produces n identical copies of this 
optimal design point as offspring, however, with random variations in its parameters. 
In the next iteration of the exploration, this set of mutated design points would be 
used by the tool starter and produce new synthesis results. This way, the algorithm 

would implement the function opt : {PxC)^ P^ with C = cost{synth{P)) . 

Exampie 3.6 

A well-known local search technique for solving hard combinatorial problems is 
simulated annealing [2]. Here, the algorithm decides based on a single result object 
which new design point in the neighborhood will be investigated next: 

opt : {PxC)^ P^ . Hence, the exploration describes simply a path in the design 
space. 

3.5 Expioration manager 

The exploration manager has just administrative tasks. To perform the exploration 
for a given synthesis tool, the exploration manager has to properly invoke the 
corresponding functions. As exploration is an iterative process, the obvious idea is to 
have a loop somewhere in the manager module that rates previously generated 
results using the cost function (cost) , starts the optimization module with these 
ratings and the corresponding parameter sets (opt) , and calls synth for each new 
design point to be explored. Conceptually, this means the successive execution of the 
functions cost , opt and synth in each iteration. Generally, this isn’t 
straightforward, since in the case if opt needs v different (cost, parameter) tuples to 
produce co new parameter sets, the function cost must be invoked v times, and 
after that synth must be executed co times. Additionally, the values v and co are 

generally not constant, and the process of the exploration must be take care of that. 
In the following, we describe how this problem is solved in the implementation of 
EXPLORA. 
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4. EXPLORA - a Tool for Versatile Design Space Exploration 

This section deals with the implementation of EXPLORA. The main issues are the 
principal process of exploration and the structure of the exploration manager module. 

4. 1 EXPLORA program structure 

In addition to the previously described exploration manager, EXPLORA consists of 
a GUI (graphical user interface) where the user can visualize the exploration results 
and choose a single implementation for further evaluation and also a JAVA based 
remote method invocation (RMI) interface between the GUI and the exploration 
manager providing the possibility for different users to interact with the exploration 
manager at the same time. That way a single instance of an exploration manager can 
care for the administration of all accessible tool starter instances on a network. This 
central instance is responsible to apportion all incoming exploration tasks to the 
registered tool manager services. 

To be able to execute several instances of a synthesis tool simultaneously on 
different computers of a network in order to speed up the exploration process, there 
is a second RMI interface between the exploration manager and the different tool 
starter instances. Although belonging to the EXPLORA tool suite, the tool manager 
objects are started on the machines providing the tool services needed for 
exploration. All tool starter instances are registered at the central exploration 
manager service. 

To provide a way for flexible optimization algorithm adaptation, a language 
interface is introduced as shown in Fig. 1 providing the possibility to implement 
these algorithms in arbitrary programming languages. This way it would, for 
instance, be possible to generate the cost function implementing object in a script 
language like Python which would be reconfigurable on-the-fly by the designer 
without the need of recompiling the EXPLORA tool suite. 

4.2 EXPLORA management process 

The exploration manager is able to serve several GUIs simultaneously by using one 
optimization module for each GUI and distributing the generated exploration tasks to 
several tool starters. Task queues are used to be able to serve requests coming from 
different users without resource collisions. The management cycle consists of a 
periodic thread which receives a new parameter set from the set generating 
optimization module if available and adds it to the appropriate task queue. The 
parameter set is forwarded to a tool as soon as the related tool instance is able to 
process a new request. Afterwards, the result is forwarded to the optimization 
module for generation of the next parameter set and at the same time to the GUI to 
be displayed. 

4.3 Graphical User Interface 

One of the main requirements for the exploration program is the ability to visualize 
the exploration results. Therefore, the graphical user interface of EXPLORA shown 
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in Fig. 2 mainly consists of a two dimensional sheet where for each explored point 
p , its result tuple q = synth(p) is displayed. The sheet is auto-scaling depending 
on the explored result values discovered so far. There is an additional text field (left) 
where the parameters of a currently selected result object can be displayed in textual 
form. For result tuples of dimension / > 3 , the user must choose 2 out of / 
quantities to be displayed. 



5. Case Study: Design Space Exploration During High-Level 
Synthesis 

Note that the following example of high-level synthesis is just one example where 
the above concepts have been applied successfully in the context of embedded 
system synthesis. Others are hardware/software partitioning (system-level 
exploration) [1] and the automatic exploration of task mappings in the context of 
massively parallel processor arrays. 

5. 1 Problem specification 

The workflow of the exploration process using EXPLORA will be demonstrated by 
the synthesis of a single-chip embedded system design based on behavioral VHDL 
with an FPGA as target architecture. The typical design flow includes three steps: 

1 . Specification and implementation of the design in VHDL. 

2. Compilation into a netlist format via an RT-level synthesis tool. 

3. Technology mapping, place and route using another tool provided by the 
FPGA vendor. 

In this case study, the Synopsys Behavioral Compiler is used for compiling the 
VHDL code into a netlist format. The FPGA data stream is created by the Xilinx 
Alliance Tools for a Xilinx 4028EXWGX as target architecture. 

Both tool suites provide a wide range of possible options strongly influencing the 
design quality. The designer has to decide, e.g., what scheduling strategy the VHDL 
compiler should use for minimum latency with unknown effects on the chip area or 
if the mapping tool should care most for speed, area, or look for a balanced solution. 
Additionally, it must be possible to explore in which way certain design constraints 
like net delay constraints specified in so-called user constraint files influence the 
quality of the results. Supposing that a complete design run from code compilation to 
a downloadable data stream may take some minutes or more, testing all possible 
parameter sets by hand is not feasible. Instead, EXPLORA will be used to explore 
solutions in a stand-alone program run over night using all workstations available to 
it on the net and then let the designer choose the most convenient one afterwards. 

Example 5. 1 

As an example, we consider a well-known differential equation benchmark from 
high-level synthesis. The behavioral VHDL specification to solve the differential 
equation y'+3xy'+3y = 0 in the interval [x,a] using step size dx and initial values 

using the Euler method is given as follows: 
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ENTITY dgl IS 

PORT(x_in,y_in,u_in,dx_in,a_in; IN REAL; 
activate: IN BIT; y_out: OUT REAL); 

END dgl; 

ARCHITECTURE behavioral OF dgl IS BEGIN 
PROCESS (activate) 

VARIABLE X, y, u, dx, a, xl, ul, yl: REAL; 

BEGIN 

X := x_in; y := y_in; u := u_in; dx := dx_in; a := a_in; 

LOOP 
xl := X + dx; 

ul := u - (3 ♦ X * u * dx) - (3 ♦ y ♦ dx); 
yl := y + (u ♦ dx); 

X := xl; u := ul; y:= yl; 

EXIT WHEN xl > a; 

END LOOP; 
y_out <= y; 

END PROCESS; 

END behavioral; 

After functional test of the VHDL specification, EXPLORA can be started. It 
generates a set of design parameters and inserts them into a makefile which provides 
a set of rules for starting the appropriate tools with correct syntax and order of 
execution. Additionally, this makefile generates the constraint files needed by the 
synthesis tools. After a single synthesis run has been completed, the same makefile 
cares for result extraction out of the log files generated during synthesis using 
standard UNIX shell tools and hands these results back to EXPLORA which then 
adapts the parameter set for the next synthesis run if necessary. Multiple synthesis 
processes can be executed in parallel on different machines if available. 

5.2 Optimization algorithm 



In this case study, the following five parameters of the synthesis process are varied 
during exploration: 



Parameter 


Affected tool 


Possible values 


I/O mode 


Synopsys BC 


superstate, free_float 


Area opt. 


Synopsys BC 


yes,no 


Cover mode 


Xilinx Mapper 


area,speed,balance,none 


Logic opt. effort 


Xilinx Mapper 


normal,high,none 


Delay-based 
cleanup passes 


Xilinx Placer 


[0,5] 



Here, a simple greedy algorithm is used to sequentially generate parameter sets 
out of the possible 288 parameter combinations as follows: 

let mincos t := oo 

Generate random initial parameter set p = (P| , ) 

loop 

let changed := false 
for/ = !...« do 
Randomly change parameter 
let q := synth{p) 
if cost(q) < mincost then 
Keep last parameter set change 
let mincost := cost(q) 
let changed := true 
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else 

Reject last parameter set change 
end if 
end for 

if changed := false then 
Exit loop 
end if 
end loop 

More sophisticated optimization algorithms like simulated annealing or 
evolutionary algorithms can easily be incorporated. 

5.3 Results 

Fig. 2 shows a screen shot of EXPLORA after 16 synthesis runs. The parameters 
important for rating the synthesis results are estimated area consumption (displayed 
on the X axis) and minimum possible FPGA clock period (displayed on the y axis). 
Additionally, the latency of the design could be extracted from the output of the 
scheduler and displayed alternatively together with any other result dimension. 



Fits Edit Action* Vtaw 




Figure 2: Result of the exploration process displayed in EXPLORA 



In Fig. 2, two Pareto-optimal points are shown: 



No. 


Max. FPGA clock freq. 


Area units 


1 


8.88 MHz 


581 


2 


7.83 MHz 


576 



The parameter set used to generate solution (1) was P= (superstate, yes, balance, 
high, 2). The remaining 14 results require a larger chip area of 592 units. 6 solutions 
are not visible in the display shown in Fig. 2 because they are covered by other 
solutions having the same result values. 

One synthesis run takes approximately 8 minutes on a Sun UltraSparc 60. So the 
sequential exploration process can be done in about two hours and leads to much 
better results than randomly choosing tool parameters. An exhaustive search of about 
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39 hours may also be feasible. If the optimization algorithm supports parallel 
evaluation of different solutions like, e.g., evolutionary algorithms, the exploration 
process can be accelerated using several workstations. 



6. Conclusions 

We have presented an approach for flexible exploration of design spaces that are 
spanned by parameter ranges of synthesis tools involved during the development of 
embedded systems. Due to space limitations, we were only able to give one example 
of a typical abstraction level, namely high-level architectural synthesis, where the 
concepts of EXPLORA apply. Other levels include the system-level where 
hardware-Zsoftware implementation decisions are taken and tasks mapped to either 
hardware of software [1] or the real-time software synthesis level where a set of 
tasks has to be scheduled under real-time constraints. The main strength of our 
approach is the flexibility in exchanging synthesis tool, cost function and 
optimization algorithm at each level. 

In the future, we would like to show that using the presented approach, also 
hierarchical design space exploration becomes possible. For example, a tree of 
differently configured EXPLORA processes may be started at different levels of 
abstraction and synchronized appropriately such that exploration results on lower 
levels of abstraction may be passed to parent processes, e.g., by the introduction of 
appropriate combining cost functions (sum, maximum, etc.). 
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AUTOMATIC CODE GENERATION FOR 
MULTIRATE SIMULINK MODELS WITH 
SUPPORT FOR THE OSEK REAL-TIME 

OPERATING SYSTEM 

C. Homburg, U. Kiffmeier, L. Kbster 
dSPACE GmbH, Paderborn, Germany 



This paper presents a block-diagram based approach for 
automatic code generation for embedded applications. The basis 
is a multirate Simulink/Stateflow model that is extended with 
implementation-specific information provided by special blocks. 
The OSEK standard is chosen as the target real-time operating 
system. The individual tasks and the inter-task communication 
have to be implemented efficiently by the code generator, the 
main mechanisms are discussed. The presented approach will be 
implemented in release 2.0 of the production code generator 
TargetLink. 



Introduction 

The software development for embedded systems is increasingly being done with the 
help of simulation tools and block diagram specifications. MATLAB, Simulink and 
Stateflow are well-accepted products in this area. Production code generators like 
TargetLink (dSPACE, 2000) are applied to transfer such graphical specifications of 
real-time control algorithms into a highly efficient, readable and reliable C code that 
fits into an Electronic Control Unit (ECU). 

When this technique was first introduced, users started with pilot projects to gain 
experience with automatic code generation. The Simulink model covered only 
relatively small parts (features) of the whole control system, for example, the idle 
speed control of an engine. Typically, there was only a single sample rate, making 
life easier for the code generator. Code was generated for each RTOS task separately. 
Many projects have been carried out successfully with this pragmatic approach, but 
the problem remained, that users had to integrate the auto-generated code manually 
into the existing real-time kernel on their ECU. 
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With the increasing confidence in automatic code generation, there is a trend to 
develop larger models of complete ECUs. Such models contain many features and 
sample rates, which have to be executed as different tasks of the ECU RTOS. 

This contribution presents the TargetLink approach to automatic code generation 
from multirate Simulink models: The function developer creates a Simulink model 
containing all the features that have to be implemented. This model concentrates on 
the control algorithms, implementation aspects are not taken into account. In the next 
step the control algorithms have to be implemented. The functional model is 
automatically converted into a TargetLink implementation model. The conversion 
process is bidirectional. Implementation-specific model properties are preserved and 
stored when the implementation model is converted back to a pure Simulink model. 
TargetLink provides a special block set for the specification of implementation 
aspects. Within these blocks the software engineer can make all the settings for the 
implementation. Finally production-ready code including the RTOS configuration for 
the OSEK real-time operating system family is automatically generated out of the 
TargetLink implementation model. 

The approach presented here is currently under development and will be part of 
the next major release of TargetLink. 



OSEK Real-Time Operating System 

The presented approach is based on the OSEK real-time operating system family, 
which was established as a standard for automotive ECUs. Many European 
automotive companies have committed themselves to this standard for new 
development projects, because OSEK’s high scalability and flexibility perfectly 
supports portability and reusability of the embedded application software. 

The OSEK specification describes a static real-time operating system (OSEK, 
2(KX)). All operating system objects such as tasks, events, messages and resources are 
created at compile time. Their attributes are described offline with the help of the 
OSEK Implementation Language (OIL). OSEK defines several standard attributes - 
like the priority of the object ’TASK” - which must be supported by all OSEK 
operating system implementations. Some attributes are defined to ensure scalability 
of the operating system. However, the OSEK operating system developer is allowed 
to define additional implementation-specific attributes for OSEK objects. Based on 
the OIL description a highly efficient real-time kernel is generated that fits the user's 
needs while eliminating all overhead for features not required by the current 
application. 

An OSEK operating system uses priority-based scheduling. In addition, the 
priority ceiling protocol is included for resource handling. The use of Resources is 
similar to the use of Semaphores in other operating systems, except that the priority 
ceiling protocol avoids deadlocks during resource occupation. Resources can be used 
to manage concurrent accesses of tasks and interrupt service routines with different 
priorities to shared resources like memory or hardware units. 
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As an extension of the OSEK standard, the Communication Services (OSEK/COM) 
and the Network Management (OSEK/NM) are provided to support distributed 
embedded systems. 



Modeling of Multirate Systems 

A typical controller model contains dozens of features each implementing a specific 
control function. Each feature may have parts which have to be executed at different 
sample rates, for example, in a 10 ms and a 100 ms time frame. Such parts are 
modeled as separate Simulink subsystems. Each of the Simulink subsystems contains 
only blocks with the same sample rate. The sample rate is specified at the Inport 
blocks or at discrete-time blocks within the subsystem. Many such subsystems may 
be wired to a feature and to the whole control system. 




Figure 1 . A typical Simulink/Stateflow diagram. 

Figure 1 shows a typical Simulink/Stateflow diagram that contains cyclic and event- 
driven parts. In an engine control application, for example, the electric throttle 
control algorithm executes periodically and the ignition/injection part is driven by 
crankshaft events. In Simulink, asynchronous functions are modeled as triggered 
subsystems. 

TargetLink provides a TASK block to declare a Simulink subsystem as a RTOS task 
and to specify its properties, for example, 

• if it is preemptive or not. 

• the events, which are owned by the task, 

• the maximum number of activations, 

• the task priority, etc. 

If a Simulink subsystem is expected to become part of a specific task, a TASK block 
has to be placed into the subsystem (Figure 2). If the user makes no specific input, 
TargetLink uses a default partitioning. By default, all the model’s periodic 
subsystems that have the same sample rate are grouped together in a common task. 
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Figure 2. Specification of tasks with the Task block. 

Within this task, the code for all subsystems with the same sample rate is executed. 
Triggered subsystems with common trigger sources are combined and their code is 
assigned either to the trigger source’s task where this is also within the code 
generators scope {FCJSystem in Figure 1 is associated to task 100ms) or to a separate 
task {Crankshaft in Figure 1). 
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Figure 3. The Task block graphical user interface. 

Figure 3 shows a screenshot of the TASK block dialog. In this case the Simulink 
subsystem is declared as a part of the T lOms task, which is activated by Event2 with 
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priority 4. Only one instance of the task can be activated at a time, and it is not 
activated automatically after startup. 

If there are several subsystems in a model with the same sample time, they can be 
combined to a single RTOS task. This leads to a more efficient real-time 
implementation by reducing the scheduler overhead for managing many tasks 
separately. Alternatively, there is an option to define the execution sequence of tasks 
explicitly using the OSEK ChainTask() command. Additionally, even multirate 
subsystems can be assigned to a single RTOS task. The only restriction is that all 
their sample times must be multiples of a common divisor. In this case a counter 
variable is used to control the execution of slower sample rates. 

As an alternative to the TASK block, the ISR block can be used to implement a 
subsystem as an interrupt service routine. This is typically used for short time-critical 
actions only and avoids scheduler overhead. 



Inter-Task Communication 

In Simulink, data exchange is modeled by signal lines. In a task-driven preemptive 
environment, data exchange becomes a more complex issue. In a RTOS environment, 
data exchange can be implemented by communication via operating system services, 
e.g. OSEK Messages. Alternatively, data exchange can be implemented via variables, 
with and without RTOS support for resource handling. 
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Figure 4. Communication between tasks. 

The signal lines between the tasks establish a data flow. By default, TargetLink 
analyzes the data flow to determine the optimal implementation. For example, the 
sample rates and execution order are evaluated to determine if double-buffering is 
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necessary on the sender or receiver side of a message. Thus RAM space can be 
saved. 

Figure 4 shows a scenario where tasks with different sample rates, priorities and 
preemptibilities exchange data. Task A sends signal 1 and signal2 to Task B. Global 
variables that can be accessed by both tasks are usually used for this data exchange. 
Since Task A is preemptive (in OSEK terms SCHEDULE=FULL) and Task B has a 
higher priority, it may occur that Task A is interrupted after signall has already been 
updated but while signall still has its old value. Consequently, Task B would work 
with values of signall and signall from different time steps. This inconsistency can 
lead to unwanted behavior and normally has to be avoided. In this example. Task A is 
responsible for implementing a mechanism that is suited to protect against 
inconsistencies. One possible solution is to use local copies for signall and signall. 
At the end of Task A there is one compact block of code that copies this local 
representation to the corresponding global variables. It still has to be ensured that 
Task A is not interrupted during this copying procedure. This can be done by 
disabling interrupts or by using the OSEK service resource. The introduction of local 
working copies ensures that this critical section is as short as possible. 
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Figure 5. The specification of inter-task-communIcation. 

It depends on the task settings and the possible interruption scenarios to determine if 
the sender or the receiver is responsible for the implementation of inconsistency 
protection. In Figure 4, signal! is also potentially exposed to inconsistency. Although 
it is a single data line it represents a vector and access to it could be interrupted. 
However, in the particular case of Figure 4, the sender cannot be interrupted by the 
receiver because Task A has a higher priority than Task C. Vice versa. Task C cannot 
be interrupted at all because it is not preemptive, i.e. SCHEDULE=NON. In this 
case, both tasks can use the global variable for signal! directly, which is of course 
the most efficicient way of data exchange. 
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Operating systems usually provide a means to exchange protected information 
between tasks. OSEK defines messages as a common interface for protected intra- 
ECU communication and for inter-ECU communication based on a common bus. For 
data exchange between tasks on the same ECU, these messages are not always the 
most efficient implementation since the automatically implemented inconsistency 
protection is sometimes not necessary, as shown in the last example. By default, 
TargetLink therefore automatically selects the most efficient implementation (global 
variables with or without copies, interrupt lock, resources etc.) depending on the task 
attributes priority and preemptibility. 

The default behavior can be overruled by explicit user settings. For instance, this 
might become necessary if the user wants to ensure that these task settings will 
remain unchanged after code generation or if a signal goes outside the scope of the 
code generator. An example of the possible settings for data exchange is shown in 
Figure 5. 



Special OSEK Blocks 

Special blocks that are not included in the normal Simulink blockset provide a 
specification interface to specific OSEK features and reproduce their behavior during 
simulation. The Alarm/Counter block is given as an example. 
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Figure 6. Alarm and Counter blocks. 

In OSEK, alarms are automatically used to set up time-periodic tasks. In addition, 
TargetLink provides a special block that makes the general alarm functionality of 
OSEK available to the user. Since each OSEK alarm is assigned to a counter, the 
TargetLink Alarm block is always combined with a special Counter block that was 
designed especially for this purpose (Figure 6). Although the alarm and the counter 
always form a unit, they are divided into two blocks because one counter can have 
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several associated alarms. Code for the counter itself will usually not be generated 
because a hardware counter/timer or an operating system counter will be used. In this 
context the TargetLink counter will be used for offline simulation only. 

Alarms are set up at system initialization time - e.g. for periodic tasks - or when a 
certain event occurs at runtime. The second case is modeled by a special input for a 
trigger that can be released by any source within the model. Absolute or relative 
alarms can be specified. The former expire at absolute counter values, whereas the 
latter expire at a certain number of (time-)ticks after the alarm was triggered. 



OSEK Configuration and Tooi interaction 

Normally, not all parts of an ECU application are auto-generated. Some tasks may be 
handcoded and have to be integrated with tasks generated by TargetLink. In this case 
there are two sources of OIL descriptions which have to be merged. 

In order to ensure consistency, TargetLink maintains an OIL database that 
contains the complete RTOS configuration. An existing OIL description of legacy 
code can be imported into this database. When new code is generated, the RTOS 
configuration is written to an OIL file, which is loaded by the OSEK builder to 
generate the OSEK kernel for the ECU (see Figure 7). Simulink components can be 
assigned to OSEK objects that are in the OIL database already, or new entries can be 
created. 
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Figure 7. Tool interaction. 
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Target Specific Code Generation 

As previously mentioned, the OSEK specification leaves a certain amount of 
flexibility to ensure optimal adaptation to different microcontrollers. As a 
consequence, different OSEK vendors implement some functionality in different 
ways, even if the implementation is for the same microcontroller. An example is the 
API function for counter access. There is no standard API function to read the system 
time. Therefore, the code for the same application appears slightly different for 
different OSEK implementations. Wherever possible, TargetLink uses standard 
OSEK API functions in the generated code. But when necessary, implementation- 
specific API functions are used to ensure efficient application code 

Another area of incompatibility is the possible OIL attributes for OSEK objects. 
Here the OS vendors usually provide a lot more options than described in the 
standard. For example, tasks have several additional attributes, and the user might 
want to specify these attributes at the same place where the standard attributes are 
specified, i.e. in the TargetLink Task block. TargetLink therefore provides a 
mechanism to identify these extensions to the standard and let the user enter the 
desired options. The key for this feature is the OIL file. OSEK dictates that each 
vendor describes even the non-standard attributes by name and possible values in 
OIL syntax. TargetLink analyzes this description and provides the necessary GUI 
where the user can, for example, specify the stack mechanism for a task. 

When the user wants to port an application from one OSEK implementation to 
another, TargetLink automatically replaces the implementation-specific API calls and 
OIL attributes. This increases portability. Although OSEK was specified mainly to 
facilitate the reuse of software, these jobs would have to be done manually if no 
automatic code generation were available. 



Simulation and Test of the Generated Code 

One of the major strengths of a block-diagram based approach is that it provides an 
executable specification of a control system. The behavior of the controller to be 
designed can be simulated offline on the PC, often using plant models to close the 
loop. Besides this Floating-Point Simulation Mode of the standard Simulink 
environment, TargetLink further provides the Production Code Host Simulation 
Mode and the Production Code Target Simulation Mode. 

In Production Code Host Simulation Mode , the TargetLink-generated (often 
fixed-point) production code replaces the original Simulink blocks during simulation 
(software-in- the-loop). Using this simulation mode, the generated code can be 
validated and fixed-point arithmetic effects can be tested. 

In Production Code Target Simulation Mode, the TargetLink-generated control 
algorithm is computed on a microcontroller that is identical to the processor on the 
production ECU (processor-in-the-loop). Simulink still provides the input signals for 
the controller. The signals are sent to and received from the target microcontroller by 
the serial interface of the PC. Using this simulation mode, final validation of the 
generated production code can be made under realistic closed-loop operating 




236 



Architecture and Design of Distributed Embedded Systems 



conditions. The output of the target compiler can be validated because it is part of the 
test loop. 

These features must still be supported when code is generated for OSEK. The 
original production code, now including OSEK API calls shall be used for 
simulation. To achieve this, TargetLink provides mechanisms for simulation with and 
without an OSEK operating system. 

If no OSEK operating system is available on the target, TargetLink creates 
additional code that implements -in a simple way - all OSEK macros and API 
functions used by the generated application code. A simple scheduler substitute calls 
the tasks by sending ‘activate task’ conunands. While a task executes on the target, 
the host waits for a response and the simulation time is halted. A preemptive 
scheduler is not needed. 

If an OSEK operating system is available on the target, the simulation works as 
follows: The plant model on the host determines the simulation time. For a simulation 
step on the microcontroller, the system timer is disconnected from its hardware 
timebase. When the simulation time in Simulink increases, the host forces the target 
to set the system timer to the corresponding value, and the original OSEK task 
activation mechanism starts. After the task has finished, the computation results are 
sent back to the host. The communication with the host is accomplished by a low- 
priority idle task. 



Conclusion 

This paper has presented an automatic production-ready code generation approach 
for multirate block diagrams with real-time operating system support. The approach 
is based on quasi-standards in the automotive industry (i.e. Simulink, OSEK, C). The 
Simulink block diagram as the modeling basis holds all necessary information from 
the top level function development down to the ECU implementation, providing a 
system specification that is executable in all development phases. The presented 
approach fiilly integrates the OSEK configuration tools and the Simulink/TargetLink 
model. It will be available as a commercial product in TargetLink release 2.0. 

Still open issues are the full support of ECU networks and WCET (worst case 
execution time) calculations in order to satisfy timing constraints. 
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